JP2001283184A

JP2001283184A - Clustering device

Info

Publication number: JP2001283184A
Application number: JP2000091863A
Authority: JP
Inventors: Hiroaki Nakamitsu; 廣晃仲光
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2000-03-29
Filing date: 2000-03-29
Publication date: 2001-10-12

Abstract

(57)【要約】【課題】簡単な構成と手順で、クラスタリングにおけ
るデータの動的変化に対応することができるクラスタリ
ング装置を提供する。【解決手段】入力データを、クラスタを用いて分類す
るクラスタリング装置において、クラスタを作成するク
ラスタ作成装置１と、クラスタ作成装置により作成され
たクラスタを用いて、入力データのクラスタリングを実
行するクラスタリング実行装置２と、クラスタリング実
行装置のクラスタリング結果を監視して誤分類された入
力データを識別するクラスタリング結果監視装置３と、
誤分類された入力データを蓄積する蓄積手段８とを設
け、蓄積手段に一定数以上のデータが蓄積された場合
に、このデータを基に、クラスタ作成装置が新たなクラ
スタを作成するように構成している。入力データの動的
変化に対応してクラスタを修正し、誤分類を抑えること
ができる。 (57) [Problem] To provide a clustering device capable of responding to a dynamic change of data in clustering with a simple configuration and a simple procedure. SOLUTION: In a clustering apparatus for classifying input data using clusters, a cluster creation apparatus 1 for creating a cluster, and a clustering execution apparatus for executing clustering of input data using clusters created by the cluster creation apparatus 2, a clustering result monitoring device 3 that monitors the clustering result of the clustering execution device and identifies input data that has been incorrectly classified;
A storage means for storing misclassified input data, wherein when a predetermined number or more of data is stored in the storage means, the cluster creating apparatus creates a new cluster based on the data; are doing. The cluster can be corrected in response to the dynamic change of the input data, and erroneous classification can be suppressed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、多数のデータをそ
の類似性からクラスに分類するクラスタリング装置に関
し、特に、入力データの動的変化に適切に対応できるよ
うにしたものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a clustering apparatus for classifying a large number of data into classes based on their similarity, and more particularly to a clustering apparatus capable of appropriately coping with dynamic changes in input data.

【０００２】[0002]

【従来の技術】従来、クラスタリング手法として、さま
ざまなものが提案されている。図６には、最も一般的な
クラスタリング装置の例を示している。2. Description of the Related Art Conventionally, various clustering methods have been proposed. FIG. 6 shows an example of the most common clustering device.

【０００３】ここで、100は学習のためのプロトタイプ
データ群を示し、102、103は、プロトタイプデータ群の
個々のデータを初期クラスタとみなしたクラスタＡ、ク
ラスタＢを示し、104は、クラスタＡ102とクラスタＢ10
3との距離を示し、105はクラスタＡ102とクラスタＢ103
とを統合したクラスタＣを示す。200は、プロトタイプ
データから作成されたクラスタ結果を示し、201、202は
最終的に作成されたクラスタＹ、クラスタＺを示す。30
0は、クラスタを用いたクラスタリング装置を示し、301
は201とまったく同型のクラスタＹ、302は202とまった
く同型のクラスタＺを示し、303は、クラスタリングの
対象である入力Ｘを示し、304はクラスタが存在する空
間上の入力Ｘ303のポイントを示す。Here, 100 denotes a prototype data group for learning, 102 and 103 denote clusters A and B, each of which is regarded as an initial cluster, and 104 denotes a cluster A102. Cluster B10
105 indicates the distance to cluster 105, and clusters A102 and B103
Is shown in the cluster C. 200 indicates a cluster result created from the prototype data, and 201 and 202 indicate the finally created cluster Y and cluster Z. 30
0 indicates a clustering device using a cluster, and 301
Denotes a cluster Y exactly the same as 201, 302 denotes a cluster Z exactly the same as 202, 303 denotes an input X to be clustered, and 304 denotes a point of the input X303 in the space where the cluster exists.

【０００４】この装置では、まず、クラスタリング装置
に必要なクラスタを作成する。これは以下の作業により
求められる。In this device, first, a cluster required for a clustering device is created. This is determined by the following work.

【０００５】学習のためのプロトタイプデータ群100か
ら、最も距離の近いクラスタを探し、その結果、クラス
タＡ102とクラスタＢ103とが選ばれたとすると、この２
つを統合してクラスタＣ105とし、クラスタＡ、Ｂは削
除する。この時クラスタＣ105は、クラスタＡ102とクラ
スタＢ103との値を両方ともを持つ。次に、また同様に
プロトタイプデータ群100から、最も距離の近いクラス
タを探し、それらを統合する、という一連の作業を繰り
返す。この時、全クラスタ数が１になった場合や、最も
距離の近いクラスタ同士の距離が、ある一定値より大き
かった場合は、作業を終了する。[0005] If the closest cluster is searched from the prototype data group 100 for learning, and the cluster A 102 and the cluster B 103 are selected as a result, this cluster 2
The two are integrated into cluster C105, and clusters A and B are deleted. At this time, the cluster C105 has both the values of the cluster A102 and the cluster B103. Next, similarly, a series of operations of searching for the closest cluster from the prototype data group 100 and integrating them are repeated. At this time, when the number of all clusters becomes 1, or when the distance between the closest clusters is larger than a certain value, the operation is terminated.

【０００６】この一連の作業により、プロトタイプデー
タから作成されたクラスタ結果200が求められ、最終的
に統合されたクラスタがクラスタＹ201、クラスタＺ202
となる。[0006] By this series of operations, a cluster result 200 created from the prototype data is obtained, and finally the integrated cluster is a cluster Y201 and a cluster Z202.
Becomes

【０００７】これら最終的に統合されたクラスタを用
い、実際のクラスタリングを行うのがクラスタを用いた
クラスタリング装置300である。このクラスタを用いた
クラスタリング装置300に入力Ｘ303が入力された時、入
力Ｘ303がクラスタＹ301内に含まれる時、入力Ｘ303
は、クラスタＹ301にクラスタリングされたという結果
となる。The clustering device 300 using the clusters performs the actual clustering using the finally integrated clusters. When the input X303 is input to the clustering apparatus 300 using this cluster, when the input X303 is included in the cluster Y301, the input X303
Is clustered into the cluster Y301.

【０００８】また、クラスタリングに自己組織化マップ
（ＳＯＭ：Self-Oraganization Map、詳しくは、T.Koho
nen,"Self-Oraganization and Ａssociative Memory",T
hirdEdition, Springer-Verlag,Ｂerlin,1989に記載さ
れている。）と呼ばれるニューラルネットワークを用い
る手法も知られている（特開平７−２３４８５３号）。
この方法では、プロトタイプデータをＳＯＭに入力し
て、ＳＯＭを形成するニューロンを学習し、学習したニ
ューロンをクラスタに分類する。クラスタが形成された
後、ＳＯＭに入力データを与えると、その入力に近い値
を持つニューロンが決定され、入力データがクラスタリ
ングされる。Further, a self-organizing map (SOM: Self-Oraganization Map, more specifically, T. Koho
nen, "Self-Oraganization and Associate Memory", T
hirdEdition, Springer-Verlag, Berlin, 1989. ) Is also known (Japanese Patent Laid-Open No. 7-234853).
In this method, prototype data is input to the SOM, the neurons forming the SOM are learned, and the learned neurons are classified into clusters. When input data is given to the SOM after the cluster is formed, a neuron having a value close to the input is determined, and the input data is clustered.

【０００９】[0009]

【発明が解決しようとする課題】しかし、前述のような
クラスタリング手法では、プロトタイプデータを用いて
クラスタを形成しているので、プロトタイプデータにの
み偏ったクラスタが形成される。そのため、実際にこれ
らのクラスタを用いて実データのクラスタリングを行っ
た時、入力データの動的な変化に対応できない、と云う
問題点がある。However, in the clustering method as described above, since clusters are formed using prototype data, clusters are formed only in prototype data. Therefore, there is a problem that when actual data is clustered using these clusters, it is not possible to cope with dynamic changes in input data.

【００１０】つまり、新たなクラスに属すべきデータ
が、時間の経過とともに生じた場合などに、従来の方法
では、全く対応ができず、いずれかのクラスタに誤分類
されることになる。In other words, when data to belong to a new class occurs with the passage of time, the conventional method cannot deal with the data at all and is erroneously classified into one of the clusters.

【００１１】この誤分類を防ぐためには、従来の方式で
は、プロトタイプデータも含めて、すべてのデータを用
いてクラスタリングし直す必要があり、大きな作業負担
が強いられる。データを新たに追加した場合にクラスタ
の修正を行う方法が、特開平５−２０５０５８号に開示
されているが、これは、新たなデータを追加したことが
既知でなければならず、かつ外部からデータの追加によ
るクラスタの修正を実行することを知らせる必要があ
り、追加するデータを自動的に集めたり、クラスタを自
動的に修正することはできない。In order to prevent this erroneous classification, in the conventional method, it is necessary to perform clustering again using all data including the prototype data, and a large work load is imposed. A method of correcting a cluster when data is newly added is disclosed in Japanese Patent Application Laid-Open No. H5-205058. However, it is necessary that it is known that new data is added, It is necessary to notify that the cluster is to be modified by adding data, and it is not possible to automatically collect additional data or modify the cluster automatically.

【００１２】本発明は、こうした従来の問題点を解決す
るものであり、簡単な構成と手順で、クラスタリングに
おけるデータの動的変化に対応することができるクラス
タリング装置を提供することを目的としている。An object of the present invention is to solve such a conventional problem, and an object of the present invention is to provide a clustering apparatus capable of responding to a dynamic change of data in clustering with a simple configuration and procedure.

【００１３】[0013]

【課題を解決するための手段】そこで、本発明では、入
力データを、クラスタを用いて分類するクラスタリング
装置において、クラスタを作成するクラスタ作成装置
と、クラスタ作成装置により作成されたクラスタを用い
て、入力データのクラスタリングを実行するクラスタリ
ング実行装置と、クラスタリング実行装置のクラスタリ
ング結果を監視して誤分類された入力データを識別する
クラスタリング結果監視装置と、誤分類された入力デー
タを蓄積する蓄積手段とを設け、蓄積手段に一定数以上
のデータが蓄積された場合に、このデータを基に、クラ
スタ作成装置が新たなクラスタを作成するように構成し
ている。Therefore, according to the present invention, in a clustering apparatus for classifying input data using clusters, a cluster creating apparatus for creating clusters and a cluster created by the cluster creating apparatus are used. A clustering execution device that performs clustering of input data, a clustering result monitoring device that monitors a clustering result of the clustering execution device to identify misclassified input data, and a storage unit that accumulates misclassified input data. When a certain number or more of data is accumulated in the accumulation means, the cluster creating apparatus creates a new cluster based on the data.

【００１４】そのため、入力データの動的変化に対応し
てクラスタを修正し、誤分類を抑えることができる。Therefore, it is possible to correct the cluster in response to the dynamic change of the input data, thereby suppressing erroneous classification.

【００１５】[0015]

【発明の実施の形態】以下、本発明の実施の形態につい
て、図面を用いて説明する。なお、本発明はこれら実施
の形態に何等限定されるものではなく、その要旨を逸脱
しない範囲において種々なる態様で実施し得る。Embodiments of the present invention will be described below with reference to the drawings. The present invention is not limited to these embodiments at all, and can be implemented in various modes without departing from the gist thereof.

【００１６】（第１の実施形態）第１の実施形態のクラ
スタリング装置は、図１に示すように、プロトタイプデ
ータを管理するプロトタイプデータＤＢ４と、プロトタ
イプデータを用いてクラスタ５を作成するクラスタ作成
装置１と、作成されたクラスタ５を用いて入力データ６
をクラスタリングするクラスタリング実行装置２と、ク
ラスタリング実行装置２のクラスタリング結果７を監視
するクラスタリング結果監視装置３と、クラスタリング
結果監視装置３によって誤分類と判断されたデータを管
理する誤分類入力データＤＢ８とを備えている。(First Embodiment) As shown in FIG. 1, a clustering apparatus according to a first embodiment includes a prototype data DB 4 for managing prototype data, and a cluster creating apparatus for creating a cluster 5 using the prototype data. 1 and input data 6 using the created cluster 5
, A clustering result monitoring device 3 that monitors a clustering result 7 of the clustering execution device 2, and a misclassification input data DB8 that manages data determined to be misclassified by the clustering result monitoring device 3. Have.

【００１７】この装置では、クラスタ作成装置１が、プ
ロトタイプデータＤＢ４を用いてクラスタ５を生成す
る。クラスタリング実行装置２は、生成されたクラスタ
５を用いて、入力された入力データ６をクラスタリング
し、クラスタリング結果７を出力する。クラスタリング
結果監視装置３は、出力されたクラスタリング結果７を
監視し、入力データ６のクラスタリング結果７に含まれ
る誤差が、ある一定値以上の値であり、明らかに誤分類
であると判断した時、その入力データ６を誤分類入力デ
ータＤＢ８に追加し、誤分類入力データＤＢ８に溜まっ
たデータの数をカウントする。この誤分類入力データＤ
Ｂ８内のデータがある一定数を超えた時、クラスタ作成
装置１に、この誤分類入力データＤＢ８を用いて、クラ
スタを作成するように指示する。In this apparatus, the cluster creating apparatus 1 generates a cluster 5 using the prototype data DB 4. The clustering execution device 2 uses the generated cluster 5 to cluster the input data 6 that has been input, and outputs a clustering result 7. The clustering result monitoring device 3 monitors the output clustering result 7 and, when it determines that the error included in the clustering result 7 of the input data 6 is a certain value or more and is clearly misclassified, The input data 6 is added to the misclassified input data DB8, and the number of data accumulated in the misclassified input data DB8 is counted. This misclassified input data D
When the data in B8 exceeds a certain number, the cluster creation device 1 is instructed to create a cluster using the misclassified input data DB8.

【００１８】各装置の動作をさらに詳しく説明する。ま
ず、クラスタ作成装置１は、クラスタ５が作成されてい
ない時と、クラスタリング結果監視装置３からクラスタ
の作成を指示された時に動作する。The operation of each device will be described in more detail. First, the cluster creation device 1 operates when the cluster 5 is not created and when the clustering result monitoring device 3 instructs creation of a cluster.

【００１９】クラスタ５が作成されていない時は、プロ
トタイプデータＤＢ４内のプロトタイプデータ群の個々
のデータを初期クラスタと見なし、その中から、最も距
離の近いクラスタを探す。この距離は図５の式１によっ
て求める。この時求められた２つのクラスタを統合し新
たなクラスタとする。統合されたクラスタは削除し、ま
た新たに作られたクラスタは、統合により削除されたク
ラスタの値をすべて持つ。同様にまたプロトタイプデー
タＤＢから、最も距離の近いクラスタを探し、それらを
統合する、という一連の作業を繰り返す。この時、全ク
ラスタ数が１になった場合や、最も距離の近いクラスタ
同士の距離が、ある一定値より大きかった場合は、作業
を終了する。When the cluster 5 has not been created, the individual data of the prototype data group in the prototype data DB 4 is regarded as the initial cluster, and the closest cluster is searched therefrom. This distance is obtained by Expression 1 in FIG. The two clusters obtained at this time are integrated to form a new cluster. The merged cluster is deleted, and the newly created cluster has all the values of the cluster deleted by the merge. Similarly, a series of operations of searching for the closest cluster from the prototype data DB and integrating them are repeated. At this time, when the number of all clusters becomes 1, or when the distance between the closest clusters is larger than a certain value, the operation is terminated.

【００２０】この一連の作業により、プロトタイプデー
タ４から作成されたクラスタ５を作成する。Through this series of operations, a cluster 5 created from the prototype data 4 is created.

【００２１】次に、クラスタリング結果監視装置３から
クラスタ作成の指示を受けた時は、誤分類入力データＤ
Ｂ８を用い、クラスタ５を作成するのと同じ動作で、ク
ラスタを作成する。この時、作成されたクラスタで、ク
ラスタ内に含まれる値の数が一定以上のものを新たなク
ラスタとしてクラスタ５に加える。最後に、誤分類入力
データＤＢをクリアする。Next, when an instruction to create a cluster is received from the clustering result monitoring device 3, the misclassified input data D
Using B8, a cluster is created in the same operation as creating cluster 5. At this time, the created cluster in which the number of values included in the cluster is equal to or more than a certain value is added to the cluster 5 as a new cluster. Finally, the misclassification input data DB is cleared.

【００２２】次に、クラスタリング実行装置２の動作に
ついて説明する。クラスタ作成装置１により作成された
クラスタ５を用いて、入力された入力データ６と、距離
の最も近いクラスタを選択する。この距離の計算は、図
５の式１によって求める。この時、選択されたクラスタ
と、誤差を表す、計算された距離とをクラスタリング結
果７として出力する。Next, the operation of the clustering execution device 2 will be described. Using the cluster 5 created by the cluster creation device 1, the cluster with the closest distance to the input data 6 is selected. The calculation of this distance is obtained by Expression 1 in FIG. At this time, the selected cluster and the calculated distance representing the error are output as the clustering result 7.

【００２３】クラスタリング結果監視装置３は、出力さ
れたクラスタリング結果７に含まれる誤差、即ち、計算
された距離が、ある一定値以上の値である時、誤分類入
力データＤＢ８に入力データ６を追加しその数をカウン
トし、この誤分類入力データＤＢ８内のデータがある一
定数を超えた時、クラスタ作成装置１にこの誤分類入力
データＤＢ８を用いて、クラスタを作成するように指示
する。The clustering result monitoring device 3 adds the input data 6 to the misclassified input data DB 8 when the error included in the output clustering result 7, that is, the calculated distance is a certain value or more. The number is counted, and when the data in the misclassified input data DB8 exceeds a certain number, the cluster creation device 1 is instructed to create a cluster using the misclassified input data DB8.

【００２４】以上のように、この実施形態のクラスタリ
ング装置では、稼動中にもクラスタの自動作成が可能で
あり、入力データの動的な変化に対応して自動的にクラ
スタを作成することができる。そのため、入力データの
動的な変化に起因する誤分類の発生が迅速に抑えられ
る。また、この装置では、クラスタの再作成が、実デー
タのクラスタリングの過程で誤分類データとして自動収
集されたデータのみを用いて行われるため、少ない負担
でクラスタの修正を実行することができる。As described above, the clustering apparatus of this embodiment can automatically create clusters during operation, and can automatically create clusters in response to dynamic changes in input data. . Therefore, the occurrence of misclassification due to a dynamic change in input data can be quickly suppressed. Further, in this device, since the cluster is re-created using only the data automatically collected as the misclassified data in the process of clustering the actual data, the cluster can be corrected with a small burden.

【００２５】（第２の実施形態）第２の実施形態のクラ
スタリング装置は、自己組織化マップ（以下、ＳＯＭと
云う）を利用してクラスタを作成する。(Second Embodiment) The clustering apparatus of the second embodiment creates a cluster using a self-organizing map (hereinafter, referred to as SOM).

【００２６】この装置は、図２に示すように、第１の実
施形態と同様、プロトタイプデータＤＢ４、クラスタ作
成装置１、クラスタリング実行装置２、クラスタリング
結果監視装置３及び誤分類入力データＤＢ８から成り、
クラスタ作成装置１は、プロトタイプデータを入力する
データ入力手段11と、ＳＯＭ９を作成するＳＯＭ作成手
段12と、ＳＯＭ９を用いてクラスタを生成するクラスタ
生成手段13とを備え、また、クラスタリング結果監視装
置３は、クラスタリング実行装置２のクラスタリング結
果７を監視するクラスタリング結果監視手段31と、誤分
類入力データＤＢ８のデータを用いてＳＯＭ10を作成す
るＳＯＭ修正手段32とを備えている。As shown in FIG. 2, this apparatus comprises a prototype data DB4, a cluster creation apparatus 1, a clustering execution apparatus 2, a clustering result monitoring apparatus 3, and a misclassification input data DB8, as in the first embodiment.
The cluster creation device 1 includes a data input unit 11 for inputting prototype data, an SOM creation unit 12 for creating an SOM 9, and a cluster creation unit 13 for creating a cluster using the SOM 9, and a clustering result monitoring device 3. Has a clustering result monitoring means 31 for monitoring the clustering result 7 of the clustering execution device 2 and an SOM correction means 32 for creating the SOM 10 using the data of the misclassified input data DB8.

【００２７】この装置では、クラスタ作成装置１のデー
タ入力手段11がプロトタイプデータＤＢ４からデータを
入力し、このデータを用いてＳＯＭ作成手段12がＳＯＭ
９を作成し、クラスタ生成手段13が、ＳＯＭ９を用いて
クラスタ５を生成する。クラスタリング実行装置２は、
生成されたクラスタ５を用いて入力された入力データ６
をクラスタリングし、クラスタリング結果７を出力す
る。クラスタリング結果監視装置３のクラスタリング結
果監視手段31は、出力されたクラスタリング結果７に含
まれる誤差が、ある一定値以上の値であり、明らかに誤
分類であると判断した時、誤分類入力データＤＢ８に入
力データ６を追加し、その数をカウントする。In this apparatus, the data input means 11 of the cluster creation apparatus 1 inputs data from the prototype data DB 4, and the SOM creation means 12 uses this data to
9 and the cluster generating means 13 generates the cluster 5 using the SOM 9. The clustering execution device 2
Input data 6 input using the generated cluster 5
Are clustered, and a clustering result 7 is output. When the clustering result monitoring means 31 of the clustering result monitoring device 3 determines that the error included in the output clustering result 7 is a certain value or more and is clearly misclassified, the misclassification input data DB 8 , And counts the number.

【００２８】誤分類入力データＤＢ８内のデータがある
一定数を超えた時、ＳＯＭ修正手段32は、誤分類入力デ
ータＤＢ８のデータを入力として新たなＳＯＭ10を作成
し、クラスタ作成手段13にＳＯＭ10を用いたクラスタ作
成を指示する。これを受けて、クラスタ作成手段13は、
ＳＯＭ10を用いてクラスタを作成し、既に作成されてい
るクラスタ５に追加する。When the data in the misclassified input data DB8 exceeds a certain number, the SOM correcting means 32 creates a new SOM10 by using the data of the misclassified input data DB8 as an input, and stores the SOM10 in the cluster creating means 13. Instruct the creation of the used cluster. In response to this, the cluster creating means 13
A cluster is created using the SOM 10 and added to the already created cluster 5.

【００２９】次に、各部の動作についてさらに詳しく説
明する。まず、ＳＯＭ作成手段12の動作について説明す
る。Next, the operation of each section will be described in more detail. First, the operation of the SOM creating means 12 will be described.

【００３０】ＳＯＭは、図４に示すように、２次元上に
配置されたニューロン402から形成され、各ニューロン
は、参照ベクトル403と呼ばれる入力と同じ次元のベク
トルを持つ。As shown in FIG. 4, the SOM is formed from neurons 402 arranged two-dimensionally, and each neuron has a vector called the reference vector 403 and having the same dimension as the input.

【００３１】ＳＯＭ作成手段12は、図３のフローチャー
トに示す手順でＳＯＭを作成する。ステップＡ１：学習回数Ｔを０にセットし、ステップＡ２：図４のように２次元上に配置したニュー
ロンを作成し、各ニューロンに対し、入力と同じ次元の
参照ベクトルを乱数で与える。The SOM creating means 12 creates an SOM according to the procedure shown in the flowchart of FIG. Step A1: The number of learning times T is set to 0. Step A2: A neuron arranged two-dimensionally as shown in FIG. 4 is created, and a reference vector of the same dimension as the input is given to each neuron by random numbers.

【００３２】ステップＡ３：データ入力手段11がプロト
タイプデータＤＢ４からランダムでデータを一つ取り出
す。Step A3: The data input means 11 takes out one piece of data at random from the prototype data DB4.

【００３３】ステップＡ４：このデータに対して、図５
の式（２）を満たす参照ベクトルを持つニューロンＣを
決定する。Step A4: For this data, FIG.
A neuron C having a reference vector satisfying the expression (2) is determined.

【００３４】ステップＡ５：ニューロンＣの近傍に位置
するニューロンの参照ベクトルを、図５の式（３）に従
って更新する。Step A5: The reference vector of the neuron located near the neuron C is updated according to the equation (3) in FIG.

【００３５】ステップＡ６：学習回数Ｔが規定した回数
に達した場合には、ステップＡ８：終了する。Step A6: When the number of learning times T reaches the specified number, step A8: ends.

【００３６】ステップＡ６において、学習回数Ｔが規定
回数に達していない場合には、ステップＡ７：学習回数Ｔの値を一つ増やし、ステップ
Ａ２に戻る。In step A6, if the number of times of learning T has not reached the specified number of times, step A7: the value of the number of times of learning T is increased by one, and the process returns to step A2.

【００３７】次に、クラスタ生成手段13は、クラスタ５
が作成されていない時と、ＳＯＭ修正手段32からクラス
タの作成の指示を受けた時に動作する。Next, the cluster generation means 13 generates the cluster 5
This operation is performed when is not created and when an instruction to create a cluster is received from the SOM correcting unit 32.

【００３８】まず、クラスタ５が作成されていない時、
ＳＯＭ９を用いてクラスタ５を作成する。ＳＯＭ９の各
ニューロンに対し、図５の式（４）を満たす参照ベクト
ルを持つニューロンを選択し、選択されたニューロンを
初期クラスタと見なす。その中から、最も距離の近いク
ラスタを探す。この距離は図５の式（１）によって求め
る。この時求められた２つのクラスタを統合し新たなク
ラスタとする。統合されたクラスタは削除し、また新た
に作られたクラスタは、統合により削除されたクラスタ
の値をすべて持つ。同様にまた、最も距離の近いクラス
タを探し、それらを統合する、という一連の作業を繰り
返す。この時、全クラスタ数が１になった場合や、最も
距離の近いクラスタ同士の距離が、ある一定値より大き
かった場合は、作業を終了する。First, when the cluster 5 has not been created,
The cluster 5 is created using the SOM 9. For each neuron of SOM9, a neuron having a reference vector satisfying equation (4) in FIG. 5 is selected, and the selected neuron is regarded as an initial cluster. From among them, the closest cluster is searched. This distance is obtained by equation (1) in FIG. The two clusters obtained at this time are integrated to form a new cluster. The merged cluster is deleted, and the newly created cluster has all the values of the cluster deleted by the merge. Similarly, a series of operations of searching for the closest clusters and integrating them are repeated. At this time, when the number of all clusters becomes 1, or when the distance between the closest clusters is larger than a certain value, the operation is terminated.

【００３９】また、ＳＯＭ修正手段32からクラスタの作
成の伝達を受けた時も同様に、ＳＯＭ10を用いてクラス
タを作成し、クラスタ５に追加をする。Similarly, when a cluster creation message is received from the SOM correction means 32, a cluster is created using the SOM 10 and added to the cluster 5.

【００４０】クラスタリング実行装置２は、第１の実施
形態と同様、クラスタ５を用いて入力データ６をクラス
タリングし、クラスタリング結果７を出力する。クラス
タリング結果監視手段31は、出力されたクラスタリング
結果７に含まれる誤差が、ある一定値以上の値であり、
明らかに誤分類であると判断した時、誤分類入力データ
ＤＢ８に入力データ６を追加し、その数をカウントす
る。The clustering execution device 2 clusters the input data 6 using the cluster 5 and outputs a clustering result 7 as in the first embodiment. The clustering result monitoring means 31 determines that the error included in the output clustering result 7 is a value equal to or more than a certain value,
When it is determined that the classification is misclassified, the input data 6 is added to the misclassification input data DB 8 and the number thereof is counted.

【００４１】この誤分類入力データＤＢ８内のデータが
ある一定数を超えた時、ＳＯＭ修正手段32は、誤分類入
力データＤＢ８のデータを入力として、図３のフローチ
ャートに従って、マップの大きさがＳＯＭ９の縦または
横のニューロンの数と等しい、小さいＳＯＭ10を作成す
る。そして、誤分類入力データＤＢ８をクリアし、クラ
スタ作成手段13にクラスタの作成を指示する。クラスタ
作成手段13は、前述するように、ＳＯＭ10を用いてクラ
スタを作成し、作成済みのクラスタ５に追加する。When the data in the misclassified input data DB8 exceeds a certain number, the SOM correcting means 32 takes the data of the misclassified input data DB8 as an input and sets the size of the map to SOM9 according to the flowchart of FIG. Create a small SOM10 equal to the number of vertical or horizontal neurons. Then, the erroneous classification input data DB 8 is cleared, and the cluster creation means 13 is instructed to create a cluster. The cluster creating means 13 creates a cluster using the SOM 10 and adds it to the created cluster 5 as described above.

【００４２】以上のように、この実施形態のクラスタリ
ング装置では、ＳＯＭを用いてクラスタリングを行って
いるため、既存のＳＯＭをそのまま適用することがで
き、さらに新たにクラスタを作成する際に非常に小さい
ＳＯＭを用いるので処理速度も高く、その実用的効果は
大きい。また、この新たなクラスタの作成には、実デー
タのクラスタリングの過程で誤分類データとして自動収
集されたものが使用されるため、この新たなクラスタの
作成により、入力データの動的な変化に対応することが
できる。As described above, in the clustering apparatus of this embodiment, since the clustering is performed using the SOM, the existing SOM can be applied as it is, and when a new cluster is created, the cluster is very small. Since the SOM is used, the processing speed is high, and the practical effect is great. In addition, since this new cluster is created using data automatically collected as misclassified data in the process of clustering actual data, this new cluster can be used to respond to dynamic changes in input data. can do.

【００４３】[0043]

【発明の効果】以上の説明から明らかなように、本発明
のクラスタリング装置は、入力データの動的な変化に対
応して、新たなクラスタを速やかに作成することがで
き、入力データの動的な変化に起因する誤分類の発生を
抑えることが可能である。As is clear from the above description, the clustering apparatus of the present invention can quickly create a new cluster in response to a dynamic change of input data, It is possible to suppress the occurrence of misclassification due to a significant change.

【００４４】また、この新たなクラスタの作成は、クラ
スタリングを実行したときに、誤分類データとして自動
収集されたデータだけを用いて行われるため、その作成
負担は少なくて済む。Further, the creation of this new cluster is performed using only data automatically collected as misclassified data when the clustering is performed, so that the creation burden is reduced.

【００４５】また、誤分類されたデータからクラスタを
直接作成する手段を持つ装置では、装置稼動中にもクラ
スタを自動で作成することが可能であり、入力データの
動的な変化に素早く対応できるという有利な効果が得ら
れる。In an apparatus having means for directly creating a cluster from misclassified data, it is possible to automatically create a cluster even while the apparatus is operating, and it is possible to quickly respond to dynamic changes in input data. The advantageous effect described above can be obtained.

【００４６】また、ＳＯＭを用いてクラスタリングする
装置では、既存のＳＯＭをそのまま適用することがで
き、また、新たにクラスタを作成する際には非常に小さ
いＳＯＭを用いるので処理速度も高いという有効な効果
が得られる。In an apparatus that performs clustering using SOM, an existing SOM can be applied as it is, and when a new cluster is created, an extremely small SOM is used, so that the processing speed is high. The effect is obtained.

【００４７】このことにより、本発明は、入力データが
時間的に変化するものをクラスタリングする装置に適用
して効果を発揮することができ、例えば、時間的に変化
する生徒の学習結果を入力データとして生徒を分類する
学習システムのクラスタリング装置や、インターネット
のホームページにアクセスする視聴者の嗜好性を調査す
るクラスタリング装置などに用いた場合に、極めて有効
である。As a result, the present invention can be effectively applied to an apparatus for clustering data whose input data changes with time. For example, the learning result of a student who changes with time can be used as input data. This is extremely effective when used in a clustering device of a learning system for classifying students, or a clustering device for examining the taste of viewers accessing an Internet homepage.

[Brief description of the drawings]

【図１】本発明の第１の実施形態におけるクラスタリン
グ装置の構成を表すブロック図、FIG. 1 is a block diagram illustrating a configuration of a clustering device according to a first embodiment of the present invention;

【図２】本発明の第２の実施形態におけるクラスタリン
グ装置の構成を示すブロック図、FIG. 2 is a block diagram showing a configuration of a clustering device according to a second embodiment of the present invention;

【図３】第２の実施形態においてＳＯＭ作成の手順を示
すフローチャート、FIG. 3 is a flowchart illustrating a procedure for creating an SOM according to the second embodiment;

【図４】ＳＯＭを視覚的に示す図、FIG. 4 is a diagram showing SOM visually.

【図５】数式を示す図、FIG. 5 is a diagram showing mathematical formulas;

【図６】従来のクラスタリング装置の一例を示す図であ
る。FIG. 6 is a diagram illustrating an example of a conventional clustering device.

[Explanation of symbols]

１クラスタ作成装置２クラスタリング実行装置３クラスタリング結果監視装置４プロトタイプデータＤＢ５クラスタ６入力データ７クラスタリング結果８誤分類入力データＤＢ９、10 ＳＯＭ 11 データ入力手段 12 ＳＯＭ作成手段 13 クラスタ生成手段 31 クラスタリング結果監視手段 32 ＳＯＭ修正手段 100 プロトタイプデータ群 102 クラスタＡ 103 クラスタＢ 104 距離 105 クラスタＣ 200 プロトタイプデータから作成されたクラスタ結果 201 クラスタＹ 202 クラスタＺ 300 クラスタを用いたクラスタリング装置 301 クラスタＹ 302 クラスタＺ 303 入力Ｘ 304 入力Ｘのポイント 401 ＳＯＭ 402 ニューロン 403 参照ベクトル DESCRIPTION OF SYMBOLS 1 Cluster creation apparatus 2 Clustering execution apparatus 3 Clustering result monitoring apparatus 4 Prototype data DB 5 Cluster 6 Input data 7 Clustering result 8 Misclassification input data DB 9, 10 SOM 11 Data input means 12 SOM creation means 13 Cluster generation means 31 Clustering result Monitoring means 32 SOM correction means 100 Prototype data group 102 Cluster A 103 Cluster B 104 Distance 105 Cluster C 200 Cluster result created from prototype data 201 Cluster Y 202 Cluster Z 300 Clustering device using cluster 301 Cluster Y 302 Cluster Z 303 Input X 304 Input X point 401 SOM 402 Neuron 403 Reference vector

Claims

[Claims]

1. A clustering apparatus for classifying input data by using a cluster, comprising: a cluster creation apparatus for creating the cluster; and clustering of the input data using the cluster created by the cluster creation apparatus. A clustering execution device, a clustering result monitoring device that monitors a clustering result of the clustering execution device to identify misclassified input data, and a storage unit that accumulates the misclassified input data. A clustering apparatus, wherein when a predetermined number or more of data is accumulated in the means, the cluster creating apparatus creates a new cluster based on the data.

2. The method according to claim 1, wherein when a predetermined number or more of data is stored in the storage unit, the cluster generating apparatus automatically generates a new cluster using the data and adds the new cluster to the already generated cluster. The clustering device according to claim 1, wherein:

3. The clustering apparatus according to claim 1, wherein the clustering result includes clustering error data.

4. A self-organizing map generating means for generating a self-organizing map by using prototype data as input, and a cluster forming means for partitioning the generated self-organizing map to form a cluster. The clustering apparatus according to claim 1, further comprising:

5. A clustering result monitoring device, comprising: a clustering result monitoring unit for monitoring a clustering result; and when a predetermined number or more of data is accumulated in the accumulation unit, a self-organizing map is generated by using the data as an input. The self-organizing map modifying means, wherein the cluster forming means of the cluster creating apparatus creates a cluster by dividing the self-organizing map when the self-organizing map modifying means generates the self-organizing map. The clustering apparatus according to claim 4, wherein the clustering apparatus adds the cluster to an already created cluster.