TWI434229B

TWI434229B - Cluster system and defect type determination device

Info

Publication number: TWI434229B
Application number: TW096124741A
Authority: TW
Inventors: Makoto Kurumisawa; Akio Suguro; Koji Ohnishi
Original assignee: Asahi Glass Co Ltd
Priority date: 2006-07-06
Filing date: 2007-07-06
Publication date: 2014-04-11
Also published as: KR20090018920A; JPWO2008004559A1; CN101484910B; TW200818060A; KR100998456B1; CN101484910A; WO2008004559A1; JP5120254B2

Description

Cluster system and defect type determining device

本發明是關於切除檢測對象物之畫像中之缺陷部份的部份畫像，自該部份畫像抽出缺陷之特徵訊號，而按缺陷之種類予以分類之叢集系統、缺陷種類判定裝置。 The present invention relates to a cluster system for removing a defective portion of an image of a detection object, a feature signal for extracting a defect from the partial image, and a classification system and a defect type determination device classified according to the type of the defect.

自以往一般執行藉由未知資料和學習資料之距離，例如馬哈朗諾比斯距離(Mahalanobis generalized distance)所產生之叢集手法。即是，藉由判定未知資料是否屬於當作事前已學習之母集團的叢集而分類，執行叢集化處理。例如，藉由馬哈朗諾比斯距離對多數叢集之大小，進行判定未知資料屬於哪一個母集團之叢集(例如，參照專利文獻1)。 The distance between unknown data and learning materials, such as the clustering technique produced by the Mahalanobis generalized distance, has been conventionally performed. That is, the clustering process is performed by determining whether or not the unknown material belongs to a cluster as a parent group that has been learned beforehand. For example, by judging the size of the majority cluster by the Mahalanobis distance, it is determined which cluster of the parent group the unknown data belongs to (for example, refer to Patent Document 1).

再者，為了有效率計算上述距離，選擇多個特徵量執行叢集化之處理。 Furthermore, in order to efficiently calculate the above distance, a plurality of feature quantities are selected to perform the process of clustering.

再者，依據藉由多數識別器(Classeifier)所取得結果之投票，判定其未知資料所歸屬之叢集判定的手法也是一般可見，使用不同感測器之輸出的識別結果，或是對一個畫像上之不同領域識別未知資料之識別結果等(例如，參照專利文獻2)。 Furthermore, based on the votes obtained by the majority of the classifiers, the method of determining the cluster decision to which the unknown data belongs is also generally visible, using the recognition result of the output of the different sensors, or on a portrait. The different fields identify the recognition result of the unknown data and the like (for example, refer to Patent Document 2).

藉由上述叢集手法，在依據血液檢查結果所取得之參數執行病情診斷，即是屬於哪一種病情的叢集化中，有設定在多個叢集中各以設定每兩個叢集為一組合，對所有組合的每個該組合進行判定被檢資料是否類似於哪一個叢集，藉由其判定之數量的統計結果，決定分類成所判定出之數量多之叢集的手法(例如，參照專利文獻3)。 By the above clustering method, the disease diagnosis is performed on the parameters obtained according to the results of the blood test, that is, which kind of condition belongs to the clustering, which is set in each of the plurality of clusters to set each of the two clusters as a combination, for all group Each of the combinations is determined to determine whether or not the detected data is similar to the cluster, and the statistical result of the number of determinations determines the method of classifying the cluster into a determined number (for example, refer to Patent Document 3).

於將安裝於LCD玻璃基板上帶有的之各缺陷按事先所設定之每個缺陷種類進行分類時，對應於分類時之識別，執行分類所使用之各特徵量的最佳化，以對應於該最佳化之方式，對各特徵量各執行加權，並使用被該最佳化之特徵量，判定屬於哪一個叢集的叢集化(例如專利文獻4)。 When each defect attached to the LCD glass substrate is classified according to each defect type set in advance, the optimization of each feature amount used for classification is performed corresponding to the identification at the time of classification to correspond to In the optimization method, weighting is performed for each feature amount, and clustering of which cluster belongs is determined using the optimized feature amount (for example, Patent Document 4).

專利文獻1：日本特開2005-214682號公報 Patent Document 1: Japanese Laid-Open Patent Publication No. 2005-214682

專利文獻2：日本特開2001-56861號公報 Patent Document 2: Japanese Laid-Open Patent Publication No. 2001-56861

專利文獻3：日本特開07-105166號公報 Patent Document 3: Japanese Laid-Open Patent Publication No. 07-105166

專利文獻4：日本特開2002-99916號公報 Patent Document 4: Japanese Laid-Open Patent Publication No. 2002-99916

但是，專利文獻3所示之叢集化，未執行各個組合之最佳化，未有效利用成為判別材料之特徵量，並且當應判別之叢集變多時，則有組合數量龐大，判定處理所需之時間增大的問題。 However, in the clustering shown in Patent Document 3, the optimization of each combination is not performed, and the feature quantity that becomes the discrimination material is not effectively utilized, and when the cluster to be discriminated is increased, the number of combinations is large, and the determination processing is required. The problem of increasing time.

再者，專利文獻4所示之叢集化，雖然欲以判定率為依據，對特徵量加權來提升判別精度，但是並無每個叢集之特徵量最佳化之概念，與上述專利文獻3相同，因無法有效活用特徵量，故有無法執行高精度之分類的缺點。 Further, in the clustering shown in Patent Document 4, the weight of the feature amount is increased based on the determination rate, and the discrimination accuracy is improved. However, the concept of optimizing the feature amount of each cluster is not the same as in Patent Document 3 described above. Since the feature amount cannot be effectively utilized, there is a disadvantage that the classification of high precision cannot be performed.

本發明鑑於如此之事情所創作出而完成者，提供可以有效運用於判別自分類成屬於叢集之對象的分類對象資料所抽出之特徵量時，比起以往例，提供能夠以更高速、更高精度將分類對象資料予以分類，例如能夠將玻璃面上帶有的缺陷分類成對應於缺陷種類之叢集的叢集系統、缺陷種類判定裝置。 The present invention has been made in view of such a problem, and provides a feature amount that can be effectively used for discriminating the classification target data classified into objects belonging to the cluster, and provides higher speed and higher than the conventional example. Accuracy classifies the classification target data, for example, a defect system on the glass surface can be classified into a cluster system corresponding to the cluster of defect types, and a defect type determination device.

為了解決上述課題，本發明是與以相同種類之特徵量算出分類對象資料和各叢集之間的距離，而決定分類目標之以往例有所不同，因對每個叢集設定可以在各叢集間取得差分之特徵量之集合，並在各個叢集之間以不同特徵量求出距離，故比起以往執行精度更高之分類。 In order to solve the above problems, the present invention differs from the conventional example in which the classification target data and the clusters are calculated by the same type of feature quantity, and the classification target is determined, and each cluster setting can be obtained between clusters. The set of the feature quantities of the difference, and the distance is obtained by different feature quantities between the clusters, so the classification with higher precision is performed than in the past.

上述特徵量之集合因根據屬於各叢集之學習資料之特性而執行，故由能夠與其他叢集區別之特徵量所構成。 Since the set of the above-described feature quantities is executed based on the characteristics of the learning materials belonging to each cluster, it is composed of feature quantities that can be distinguished from other clusters.

即是，本發明採用以下之構成。 That is, the present invention adopts the following constitution.

本發明之叢集系統屬於將輸入資料(input data)藉由該輸入資料所具有之特徵量(parameter)，分類成藉由學習資料(learning data)之母集團(population)所形成之各個叢集的叢集系統，其特徵為具有：特徵量集合記憶部，該特徵量集合記憶部是對應於各個叢集，記憶有分類所使用之特徵量之組合的特徵量集合(parameter set)；特徵量抽出部，該特徵量抽出部是自輸入資料抽出事先所設定之特徵量；距離計算部，該距離計算部是對對應於各叢集之每個特徵量集合，根據該特徵量集合所含之特徵量，計算出各叢集之母集團之中心和上述輸入資料之距離，以當作各個集合距離而予以輸出；和順位抽出部，該順位抽出部是由小至大之順位配列上述各集合距離。 The cluster system of the present invention belongs to a cluster in which input data is classified into a cluster formed by a population of learning data by a parameter of the input data. The system is characterized in that: the feature quantity set storage unit is a parameter set corresponding to each cluster and memorizing a combination of feature amounts used for classification; a feature quantity extracting unit, The feature amount extracting unit extracts a feature amount set in advance from the input data, and a distance calculating unit that corresponds to each Each feature quantity set of the cluster calculates the distance between the center of the parent group of each cluster and the input data according to the feature quantity contained in the feature quantity set, and outputs the distance as the set distance; and the order extraction unit, The order extracting unit arranges the respective set distances from the smallest to the largest.

本發明之最佳叢集系統，是在每個叢集設定有多個上述特徵量集合。 In the preferred clustering system of the present invention, a plurality of the above-described feature amount sets are set in each cluster.

本發明之最佳叢集系統，是又具有叢集分類部，該叢集分類部是藉由表示根據在每個特徵量集合取得之上述集合距離中的該集合距離的順位設定之輸入資料分類到各個叢集之分類基準的規則模式，檢測出上述輸入資料屬於哪一個叢集。 The optimal clustering system of the present invention further has a cluster classification unit that classifies the clusters into the clusters by input data indicating the order of the set distances in the set distances obtained in each of the feature amount sets. The rule mode of the classification criterion detects which cluster the input data belongs to.

本發明之最佳叢集系統，是上述叢集分類部藉由上述集合距離之順位，檢測出上述輸入資料屬於哪一個叢集，並將該順位成為上位之集合距離為多之叢集，當作上述輸入資料所屬之叢集而予以檢測出。 In the optimal clustering system of the present invention, the cluster classification unit detects, by the rank of the set distance, which cluster the input data belongs to, and sets the rank as a cluster of the upper set distance as the input data. It is detected by the cluster it belongs to.

本發明之最佳叢集系統，是上述叢集分類部具有相對於順位成為上位之數量的臨界值，成為上位之叢集若為該臨界值以上時，則當作輸入資料所屬之叢集而予以檢測出。 In the optimum clustering system of the present invention, the cluster classification unit has a threshold value that is higher than the rank, and if the upper cluster is equal to or greater than the threshold value, it is detected as a cluster to which the input data belongs.

本發明之最佳叢集系統，是上述距離計算部對上述集合距離，乘算對應於特徵量集合而所設定之補正係數，使各特徵量集合間之集合距離予以標準化。 In the optimum clustering system of the present invention, the distance calculating unit multiplies the set distance by the correction coefficient set corresponding to the feature amount set, and normalizes the collective distance between the feature amount sets.

本發明之最佳叢集系統，又具有作成每個叢集之特徵量集合之特徵量集合作成部，上述特徵量集合作成部是針對各特徵量之多數組合中之每個組合，將各叢集之母集團之學習資料之平均值當作原點，求出該原點和其他叢集之母集團之各學習資料之距離之平均值，並將成為最大平均值之特徵量之組合，選擇為用於將各叢集從其他叢集辨別出之特徵量集合。 The optimal clustering system of the present invention further has a feature quantity set cooperation unit for creating a feature quantity set of each cluster, and the feature quantity set cooperation part is a needle For each of the majority of the combinations of the feature quantities, the average of the learning materials of the parent group of each cluster is taken as the origin, and the average of the distances of the learning materials of the origin and the parent group of the other clusters is obtained. And will be the combination of the feature quantities of the largest average, selected as the set of feature quantities used to distinguish each cluster from other clusters.

本發明之缺陷種類判定裝置，設置如上述所記載之叢集系統中之任一者，上述輸入資料為製品缺陷之畫像資料，藉由表示缺陷之特徵量，將畫像資料中之缺陷按照缺陷之種類進行分類。 The defect type determining apparatus according to the present invention is provided with any one of the cluster systems described above, wherein the input data is an image data of a product defect, and the defect in the image data is classified according to the type of the defect by indicating the feature amount of the defect. sort.

本發明之最佳缺陷種類判定裝置，是上述製品為玻璃物品，將該玻璃物品之缺陷按照缺陷之種類進行分類。 In the optimum defect type determining device of the present invention, the product is a glass article, and the defects of the glass article are classified according to the type of the defect.

本發明之缺陷檢測裝置，設置上述所記載之缺陷種類判定裝置，用以檢測製品之缺陷的類別。 The defect detecting device of the present invention is provided with the defect type determining device described above for detecting the type of defect of the product.

本發明之製造狀態判定裝置，設置有如上述所記載之缺陷種類判定裝置，執行製品之缺陷的分類，根據與對應於該類別之發生要因的對應性，檢測出製程中缺陷之發生要因。 The manufacturing state determination device according to the present invention is provided with the defect type determination device described above, and performs classification of defects of the product, and detects the cause of the defect in the process based on the correspondence with the cause of occurrence of the category.

本發明之最佳的製造狀態判定裝置，設置有如上述所記載之叢集系統中之任一個，上述輸入資料為表示製品之製程中之製造條件的特徵量，按照製程之各工程的製造狀態對該特徵量進行分類。 The optimum manufacturing state judging device of the present invention is provided with any one of the cluster systems described above, wherein the input data is a feature amount indicating a manufacturing condition in a process of the product, and the manufacturing state is in accordance with the manufacturing state of each process of the process. The feature quantities are classified.

本發明之最佳的製造狀態判定裝置，是上述製品為玻璃物品，將該玻璃物品之製程中之特徵量，按照製程之各工程的製造狀態進行分類。 In the optimum manufacturing state judging device of the present invention, the product is a glass article, and the feature amount in the process of the glass article is classified according to the manufacturing state of each process of the process.

本發明之製造狀態檢測裝置，設置有如上述所記載之製造狀態判定裝置，檢測出製品製程之各工程中的製造狀態之類別。 The manufacturing state detecting device of the present invention is provided with the manufacturing state determining device as described above, and detects the type of manufacturing state in each of the processes of the product process.

本發明之製品製造管理裝置，設置有如上述所記載之製造狀態判定裝置，執行檢測出製品製程之各工程中的製造狀態之類別，並根據對應於該類別之控制項目，進行製程工程中之製程控制。 The product manufacturing management device of the present invention is provided with the manufacturing state determining device as described above, and performs a process for detecting a manufacturing state in each process of the product manufacturing process, and performs a process in the process engineering according to a control item corresponding to the type. control.

如以上說明般，若藉由本發明，因對分類目標之每個叢集，自具有分類對象資料之多個特徵量，事先設定與其他叢集之距離變遠的最佳特徵量之組合，分別計算分類對象資料和各叢集之間的距離，將分類對象資料分類成該所計算之距離為最小之叢集，故比起以往之手法，可以更正確將分類對象資料分類成對應的叢集。 As described above, according to the present invention, since each cluster of the classification target has a plurality of feature quantities having the classification target data, the combination of the optimal feature amounts that are distant from the other clusters is set in advance, and the classification is respectively calculated. The distance between the object data and each cluster is classified into the cluster with the smallest distance calculated, so that the classification object data can be classified into the corresponding cluster more correctly than in the past.

再者，若藉由本發明，由於對每個叢集設定多個上述組合，以由小至大之順序排列全叢集和分類對象資料之計算結果的距離，將分類對象資料分類成預先所設定之數量之上位群集所含之數量最多的叢集，故比起以往可以執行精度高之分類。 Furthermore, according to the present invention, since a plurality of the above combinations are set for each cluster, the distances of the calculation results of the full cluster and the classified object data are arranged in ascending order, and the classified object data is classified into the previously set number. The upper cluster has the largest number of clusters, so it can perform high-precision classifications compared to the past.

本發明之叢集系統是關於藉由該輸入資料所具有之特徵量，將分類對象之輸入資料分類成將學習資料當作母集團而所形成之各個叢集的叢集系統，具有特徵量集合記憶部，用於對應於上述各個叢集，記憶有分類所使用之特徵量之組合的特徵量集合，根據事先設定有特徵量抽出部之該特徵量集合，自上述輸入資料抽出特徵量，距離計算部對對應於各叢集之每個特徵量，根據該特徵量集合所含之特徵量，將母集團及與上述輸入資料之距離，各當作集合距離予以計算，順位抽出部以從小至大之順序配列各集合距離，對應於配列順序而執行叢集之分類。 The cluster system of the present invention relates to classifying input data of a classification object into learning materials as a parent set by using the feature quantity of the input data. The cluster system of each cluster formed by the group has a feature quantity set storage unit for storing a feature quantity set corresponding to the combination of the feature quantities used for classification in accordance with each of the above clusters, and the feature quantity extracting unit is set in advance according to The feature quantity set extracts the feature quantity from the input data, and the distance calculation unit pairs the parent group and the input data according to the feature quantity included in the feature quantity set for each feature quantity corresponding to each of the cluster sets. The set distance is calculated as a set distance, and the set extracting units are arranged in the order of small to large, and the clustering is performed corresponding to the arrangement order.

[First Embodiment]

以下，參照圖面，說明藉由本發明之第1實施形態之叢集系統。第1圖是表示藉由同實施形態之叢集系統之構成例的方塊圖。 Hereinafter, a cluster system according to a first embodiment of the present invention will be described with reference to the drawings. Fig. 1 is a block diagram showing an example of the configuration of a cluster system according to the same embodiment.

本實施形態之叢集系統是如第1圖所示般，具有特徵量集合作成部1、特徵量抽出部2、距離計算部3、特徵量集合記憶部4及叢集資料庫5。 The cluster system of the present embodiment has a feature amount set cooperation unit 1, a feature amount extracting unit 2, a distance calculating unit 3, a feature amount set storage unit 4, and a cluster database 5 as shown in Fig. 1 .

在特徵量集合記憶部4，對應於各叢集之識別資訊，記憶有個別被設定在每個叢集之表示分類對象資料之特徵量之組合的特徵量集合。例如，分類對象資料為特徵量之集合{a、b、c、d}之時，各叢集之特徵量集合被當作[a、b]、[a、b、c、d]、[c]等之種類之特徵量的組合設定。以下之說明中，由上述特徵量之集合，將特徵量所有組合、多個(在上述例中，集合中之任意兩個、三個特徵量)之組合中之任一個定義成「特徵量之組合」。 The feature amount set storage unit 4 stores, in association with the identification information of each cluster, a feature amount set that is set in a combination of the feature amounts of the classification target data for each cluster. For example, when the classification object data is a set of feature quantities {a, b, c, d}, the feature quantity set of each cluster is treated as [a, b], [a, b, c, d], [c] The combination of the feature quantities of the types. In the following description, any combination of the feature amounts and a plurality of combinations (in the above example, any two of the sets, three feature amounts) are defined as "feature quantities". combination".

在此，叢集A、B及C被設定為分類目標之叢集時，對應於各叢集之特徵量集合，使用事先被分類於各叢集之學習資料，以求出各叢集和其他叢集之距離變大之特徵量之組合，而記憶於特徵量集合記憶部4。 Here, when the clusters A, B, and C are set as the clusters of the classification targets, the learning materials classified in the clusters are used in advance to obtain the distance between the clusters and the other clusters. The combination of the feature amounts is stored in the feature amount set storage unit 4.

例如，對叢集A所設定之特徵量集合，是由屬於叢集A之學習資料之各特徵量之平均值所構成之向量，和屬於其他叢集B及C之學習資料之各特徵量之平均值所構成之向量的距離，被設定為最大之特徵量之組合。 For example, the feature quantity set set for the cluster A is a vector composed of the average values of the feature amounts of the learning materials belonging to the cluster A, and an average value of the respective feature quantities of the learning materials belonging to the other clusters B and C. The distance of the constituent vectors is set to the combination of the largest feature quantities.

再者，分類對象資料和各叢集中之母集團之學習資料是由相同之特徵量之集合所構成。 Furthermore, the classification object data and the learning materials of the parent group in each cluster are composed of the same set of feature quantities.

特徵量抽出部2自所輸入之分類對象資料，計算與各叢集之距離時，自特徵量集合記憶部4讀出對應於為計算對象之叢集的特徵量集合，自分類對象資料之特徵量抽出對應於該特徵量組合之特徵量，將抽出之特徵量輸出至距離計算部3。 When the distance between the clusters is calculated from the input classification target data, the feature amount extracting unit 2 reads out the feature amount set corresponding to the cluster to be calculated from the feature amount set storage unit 4, and extracts the feature amount of the classification target data. The extracted feature amount is output to the distance calculating unit 3 in accordance with the feature amount of the feature amount combination.

距離計算部3是自叢集資料庫5，以計算對象之叢集之識別資訊作為金鑰，讀出由成為計算對象之叢集的學習資料之各特徵量的平均值所構成之向量，根據該叢集之特徵量集合，算出由分類對象資料抽出之特徵量所構成之向量，和由學習資料之各特徵量之平均值所構成之向量(表示叢集中多個學習資料之重心位置的重心向量)之距離。 The distance calculation unit 3 is a self-cluster database 5, and uses the identification information of the cluster of calculation objects as a key, and reads a vector composed of the average values of the feature amounts of the learning data of the cluster to be calculated, according to the cluster. The feature quantity set calculates a vector formed by the feature quantity extracted by the classification target data, and a distance formed by a vector composed of an average value of each feature quantity of the learning material (a center of gravity vector indicating a position of a center of gravity of a plurality of learning materials in the cluster) .

於執行上述距離之計算時，距離計算部3為了使特徵量之資料單位無差異，將特徵量之數值予以標準化，藉由下述(1)式，對分類對象資料之各特徵量v(i)執行正規化。 In the calculation of the above-described distance, the distance calculating unit 3 normalizes the numerical value of the feature quantity so that the data unit of the feature quantity does not differ, and the feature quantity v(i) of the classification target data by the following formula (1) Execution positive Regulation.

V(i)=(v(i)-avg.(i))/std.(i)...(1) V(i)=(v(i)-avg.(i))/std.(i)...(1)

在此，v(i)為特徵量，avg.(i)為計算對象之叢集內之學習資料中之特徵量之平均值，std.(i)為計算對象之叢集內之學習資料中之特徵量之標準偏差(standardized deviation)，V(i)為被正規化之特徵量。因此，於計算距離時，距離計算部3必須對每個特徵量集合，執行各特徵量之規格化。 Here, v(i) is the feature quantity, avg.(i) is the average value of the feature quantity in the learning material in the cluster of the calculation object, and std.(i) is the feature in the learning material in the cluster of the calculation object. The standard deviation of the quantity, V(i) is the characteristic quantity normalized. Therefore, when calculating the distance, the distance calculation unit 3 must perform normalization of each feature amount for each feature amount set.

再者，距離計算部3對用以計算分類對象資料中之距離的每個特徵量，使用學習資料之分別對應的特徵量之平均值和標準偏差，而執行上述正規化處理。 Further, the distance calculating unit 3 executes the above-described normalization processing using the average value and the standard deviation of the feature amounts corresponding to the respective learning materials for each feature amount for calculating the distance in the classification target data.

再者，作為距離即使採用使用上述標準化之特徵量之標準化歐幾里德距離(standardized Euclidean distance)、馬哈朗諾比斯距離(Mahalanobis distance)、明可夫斯基距離(Minkowskiy distance)等中之任一者亦可。 Further, as the distance, a standardized Euclidean distance, a Mahalanobis distance, a Minkowskiy distance, or the like using the above-described standardized feature quantity is used. Either one can be.

在此，於使用馬哈朗諾比斯距離時，馬哈朗諾比斯平方距離(Mahalanobis squared distance)MHD則藉由以下(2)式求出。 Here, when the Mahalanobis distance is used, the Mahalanobis squared distance MHD is obtained by the following formula (2).

MHD=(1/n)‧(V^TR^-1V)...(2) MHD=(1/n)‧(V ^T R ^-1 V)...(2)

上述(2)式中之矩陣V中之各要素V(i)，是對未知資料之多次元之特徵量v(i)，藉由該叢集內之學習資料之特徵量之平均值avg.(i)和標準偏差std.(i)，以上述(1)式所求出的特徵量。n為自由度，本實施形態中，表示特徵量集合(後述)中特徵量之數量的特徵量數。藉此，馬哈朗諾比斯平方距離為加算n個被變換之特徵量之差分的數值，藉由(馬哈朗諾比斯平方距離)/n，母集團平均之單位距離成為1。再者，V^T為將特徵量v(i)當作要素之矩陣V之轉置矩陣，R^-1V為叢集內之學習資料中之各特徵量間的相關矩陣(correlateion matrix)R之逆矩陣。 Each element V(i) in the matrix V in the above formula (2) is a feature quantity v(i) of a plurality of elements of unknown data, and an average value avg. of the feature quantity of the learning data in the cluster. i) and the standard deviation std. (i), the feature quantity obtained by the above formula (1). n is a degree of freedom. In the present embodiment, the number of feature quantities indicating the number of feature quantities in the feature amount set (described later) is shown. Thereby, the Mahalanobis squared distance is a value obtained by adding the difference of the n transformed feature quantities, and the unit distance of the parent group average becomes 1 by (Marananobis squared distance) /n. Furthermore, V ^T is a transposed matrix of the matrix V with the feature quantity v(i) as the element, and R ^-1 V is the inverse of the correlation matrix R between the feature quantities in the learning data in the cluster. matrix.

特徵量集合作成部1是對每個叢集算出上述距離計算部3於計算分類對象資料和各叢集之間的距離時使用的特徵量集合，將算出結果對應於各叢集識別資訊，寫入至特徵量集合記憶部4而予以記憶。 The feature amount set cooperation unit 1 calculates a feature amount set used when the distance calculation unit 3 calculates a distance between the classification target data and each cluster for each cluster, and writes the calculation result to each cluster identification information, and writes the feature to the feature. The quantity is stored in the memory unit 4.

於算出特徵量集合時，特徵量集合作成部1是對每個叢集，以屬於生成特徵量集合之對象叢集的學習資料之重心向量(barycentric vetor)，和屬於除該對象叢集之外的其他所有叢集之學習資料之重心向量之距離為基準，藉由下式(3)計算判別基準(discriminant criterion)之值λ。以下，將特徵量之組合當作特徵量集合予以說明。 When calculating the feature quantity set, the feature quantity set cooperation part 1 is a barycentric vetor of the learning material belonging to the object cluster of the generated feature quantity set for each cluster, and belongs to all other than the object cluster. The distance of the center of gravity vector of the learning data of the cluster is used as a reference, and the value λ of the discriminant criterion is calculated by the following equation (3). Hereinafter, the combination of the feature amounts will be described as a feature amount set.

λ=ω_oω_i(μ_o-μ_i)²/(ω_oσ_o ²+ω_iσ_i ²)...(3) λ=ω _o ω _i (μ _o -μ _i ) ² /(ω _o σ _o ² +ω _i σ _i ² )...(3)

在上述(3)中，μ_i為「屬於對象叢集之學習資料(叢集內母集團)」之特徵量集合中之由特徵量之平均值所構成之重心向量。σ_i是藉由屬於該叢集內母集團之學習資料之特徵量所生成之向量之標準偏差。ω_i為屬於叢集內母集團之學習資料數對屬於全叢集之學習資料數之比率。再者，μ_o為「屬於對象叢集外之叢集的學習資料(對象叢集外母集團)」之特徵量集合中之由特徵量之平均值所構成之重心向量。σ_o是藉由屬於該對象叢集外母集團之學習資料之特徵量所生成之向量之標準偏差。ω_o為屬於叢集外母集團之學習資料數對屬於全叢集之學習資料數之比率。在此，式(3)中之(μ_o-μ_i)即使使用log(對數)及平方根之數值亦可。再者，在此計算各向量時，特徵量集合作成部1是藉由式(1)，計算每個特徵量被規格化之特徵量來使用。再者，即使設定固有值作為事先運算比率ω_i及ω_o之後分離變大的數值亦可。 In the above (3), μ _i is a centroid vector composed of an average value of the feature quantities in the feature quantity set of the learning material belonging to the target cluster (the parent group in the cluster). σ _i is the standard deviation of the vector generated by the feature quantities of the learning materials belonging to the parent group within the cluster. ω _i is the ratio of the number of learning materials belonging to the parent group within the cluster to the number of learning materials belonging to the full cluster. Further, μ _o is a centroid vector composed of an average value of the feature quantities in the feature quantity set of the learning material (object cluster outer group) belonging to the cluster outside the cluster. σ _o is the standard deviation of the vector generated by the feature quantities of the learning materials belonging to the parent group of the object cluster. Ω _o is the ratio of the number of learning materials belonging to the cluster parent group to the number of learning materials belonging to the full cluster. Here, the formula (3) in the (μ _o -μ _i) even if log (logarithmic), and also the value of the square root. Further, when the vectors are calculated here, the feature amount set cooperation unit 1 calculates the feature amount for which each feature amount is normalized by the equation (1). Further, even if the eigenvalue is set as the value obtained by dividing the ratio ω _i and ω _{o in} advance, the separation becomes large.

然後，特徵量集合作成部1是每個對象叢集使用上述式(3)，對構成學習資料之特徵量中之任一者或所有組合，計算出與其他叢集判別的上述判別基準值λ。以由大至小之順序排列所計算之判別基準值λ，輸出判別基準值λ之順位清單。 Then, the feature amount set cooperation unit 1 calculates the discrimination reference value λ which is discriminated from the other clusters by using the above equation (3) for each of the target clusters and for any or all of the feature amounts constituting the learning material. The calculated discrimination reference value λ is arranged in descending order, and a ranking list of the discrimination reference value λ is output.

在此，特徵量集合作成部1是將對應於最大判別基準值λ之特徵量之組合，當作對象叢集之特徵量集合，與判別基準值λ之值，同時對應於叢集之識別資訊，記憶至特徵量集合記憶部4。 Here, the feature quantity set cooperation unit 1 is a combination of the feature quantities corresponding to the maximum discrimination reference value λ as the feature quantity set of the target cluster, and the value of the discrimination reference value λ, and corresponds to the identification information of the cluster, and the memory The feature quantity set storage unit 4.

上述判別基準值λ之決定是如第2圖(a)所示般，特徵量集合作成部1是於進行設定各叢集之特徵量集合之時，學習資料及分類對象資料之特徵量為a、b、c、d之4個的情況下，計算所有該4個特徵量全部、多個、其中任一個之一個全組合的判別基準值λ。 The determination of the discrimination reference value λ is as shown in Fig. 2(a), and when the feature quantity set cooperation unit 1 sets the feature quantity set of each cluster, the feature quantity of the learning data and the classification target data is a, In the case of four of b, c, and d, the discriminant reference value λ of all of the four feature quantities, all of them, and one of them is calculated.

然後，特徵量集合作成部1選擇最高之數值，例如在第2圖(a)中選擇特徵量b、c之組合。 Then, the feature quantity set cooperation unit 1 selects the highest value, for example, In Fig. 2(a), a combination of feature quantities b and c is selected.

再者，以其他判別基準值λ之方法而言，有如第2圖(b)所記載般之BSS法，即是運算使用分類對象資料之集合所含之所有特徵量n個之判別基準值λ，接著，對自特徵量n個之集合取出n-1個之組合，運算判別基準值λ。然後，自其n-1個之判別基準值λ選擇最大值之組合，此次自該n-1個之特徵量對n-2個所有組合，運算判別基準值λ。如此一來，即使以依順位使特徵量一個一個自集合減少，自減少之特徵量的集合，選擇又減少一個之組合而運算判別基準值λ，選擇可以以較少特徵量數來判別之組合之方式，構成特徵量集合作成部1亦可。 Further, in the method of discriminating the reference value λ , there is a BSS method as described in FIG. 2(b), that is, a discriminant reference value λ for calculating all the feature amounts included in the set of classification target data. Then, a combination of n-1 is extracted from the set of n feature quantities, and the discrimination reference value λ is calculated. Then, the combination of the maximum values is selected from the n-1 discrimination reference values λ, and the reference value λ is calculated from the n-1 feature quantities for all combinations of n-2. In this way, even if the feature quantity is reduced by the self-collection by the order, the set of the feature quantity is reduced, and the combination is reduced by a combination to calculate the reference value λ, and the combination can be determined with less feature quantity. In this manner, the feature quantity set cooperation unit 1 may be configured.

再者，又以其他判別基準值λ之方法而言，有如第2圖(c)所記載般之FSS法，即是自分類對象資料之集合所包含之特徵量n個一個一個讀出特徵量之全種類，運算各特徵量之判別基準值λ，選擇該中具有最大判別基準值之特徵量。接著，生成由該特徵量和除此之外之特徵量的兩個特徵量所構成之組合，計算相對於各個組合的判別基準值λ。然後，自其組合中選擇具有最大判別基準值之組合。接著，生成由該組合，和於該組合中不含有之特徵量的3個特徵量所構成之組合，生成各個的判別基準值λ。如此一來，即使以順序自跟前之特徵量的組合選擇具有最大判別基準值λ之特徵量，使組合之特徵量，相對於組合增加1個不存在於該組合之特徵量，計算增加後之組合的特徵量之判別基準值λ，並從該組合選擇具有最大的判別基準值λ的組合，又運算增加1個不存在於該組合之特徵量後的特徵量之組合之判別基準值λ，最後，自計算判別基準值λ之所有組合，將判別基準值λ成為最大之組合當作特徵量集合而予以選擇之方式，構成特徵量集合作成部1亦可。 Further, in another method of discriminating the reference value λ, the FSS method as described in FIG. 2(c) is a feature quantity included in the set of self-classification target data. For all types, the discrimination reference value λ of each feature amount is calculated, and the feature amount having the largest discrimination reference value is selected. Next, a combination of the feature amounts and the other feature amounts of the feature amounts other than the above is generated, and the discrimination reference value λ with respect to each combination is calculated. Then, the combination having the largest discrimination reference value is selected from its combination. Next, a combination of the three feature quantities of the combination and the feature amount not included in the combination is generated, and each discrimination reference value λ is generated. In this way, even if the feature quantity having the largest discriminant reference value λ is selected in the order from the combination of the feature quantities in the order, the feature quantity of the combination is increased by one feature amount that does not exist in the combination with respect to the combination, and the calculation is increased. The discriminating reference value λ of the combined feature quantity, and the combination having the largest discriminant reference value λ is selected from the combination, and the discriminant reference value λ of the combination of the feature amounts that are not present in the combined feature quantity is calculated. Finally, it is also possible to form the feature amount set cooperation unit 1 by calculating all combinations in which the reference value λ is determined and the combination in which the discrimination reference value λ is the largest is selected as the feature amount set.

接著，藉由判別基準值λ，藉由第3圖及第4圖表示叢集所使用之特徵量集合之選擇的有效性。 Next, by determining the reference value λ, the validity of the selection of the feature amount set used in the cluster is represented by the third and fourth figures.

第3圖是針對自特徵量a、b、c、d、e，抽出特徵量a及g之組合，特徵量a及h之組合，和特徵量d及e之組合，以當作選擇特徵量集合之組合，自該些組合，選擇在叢集1、叢集2及叢集3，具有較以往例高分類特性之特徵量集合而予以說明。 Figure 3 is a combination of the feature quantities a and g, the combination of the feature quantities a and h, and the combination of the feature quantities d and e for the feature quantities a, b, c, d, e. The combination of sets, selected from clusters 1, cluster 2, and cluster 3 from these combinations, is described with a set of feature quantities having higher classification characteristics than the conventional examples.

在第3圖中，μ1是對應於μ_i，μ2是對應於μ_o，σ1是對應於σ_i，σ2是對應於σ_o，ω1是對應於ω_i，ω2是對應於ω_o。 In Fig. 3, μ1 corresponds to μ _i , μ 2 corresponds to μ _o , σ ₁ corresponds to σ _i , σ 2 corresponds to σ _o , ω 1 corresponds to ω _i , and ω 2 corresponds to ω _o .

其中，在上述組合中，判別基準值λ之值最大為特徵量a及h之組合，將該組合用於叢集1和除此以外之叢集之分離，藉由第4圖確認叢集1和除此以外之叢集(叢集2及3)之分類結果。 Wherein, in the above combination, the value of the discrimination reference value λ is at most a combination of the feature quantities a and h, and the combination is used for the separation of the cluster 1 and the other clusters, and the cluster 1 is confirmed by the fourth graph and Classification results for clusters other than clusters (clusters 2 and 3).

第4圖中，橫軸表示使用特徵量之組合而運算之馬哈朗諾比斯距離之log之數值，縱軸是表示具有對應之數值的分離對象資料之數(直方圖)。在此，橫軸之數值1.4是指馬哈朗諾比斯距離之log之數值1.4未滿且1.2以上(1.4左側之數值)之意。其他橫軸上之數值也相同。再者，在第4圖中，1.4≦是表示1.4以上之意。第4圖之馬哈朗諾比斯距離是使用對應於叢集1之特徵量集合，對屬於叢集1及除此以外之叢集的分類對象資料各予以計算者。 In Fig. 4, the horizontal axis represents the value of the log of the Mahalanobis distance calculated using the combination of the feature amounts, and the vertical axis represents the number of the separated object data (histogram) having the corresponding numerical value. Here, the value of 1.4 on the horizontal axis means that the value of the log of the Mahalanobis distance is less than 1.4 and 1.2 or more (the value on the left side of 1.4). The values on the other horizontal axes are also the same. again In Fig. 4, 1.4≦ means 1.4 or more. The Mahalanobis distance of Fig. 4 is calculated using the feature quantity set corresponding to cluster 1 and the classification object data belonging to cluster 1 and other clusters.

第4圖(a)為使用特徵量a及g之組合而運算馬哈朗諾比斯距離之例，第4圖(b)為使用特徵量a及h之組合而運算馬哈朗諾比斯距離之例，第4圖(c)為使用特徵量d及e之組合而運算馬哈朗諾比斯距離之例。當觀看第4圖中之直方圖時，當判別基準值λ之數值大時，可知良好執行叢集1和其他叢集之分類。 Fig. 4(a) shows an example of calculating a Mahalanobis distance using a combination of feature quantities a and g, and Fig. 4(b) is a calculation of Mahalanobis using a combination of feature quantities a and h. In the case of the distance, Fig. 4(c) is an example of calculating the Mahalanobis distance using a combination of the feature quantities d and e. When viewing the histogram in FIG. 4, when the value of the discrimination reference value λ is large, it is understood that the classification of cluster 1 and other clusters is well performed.

接著，參照第5圖及第6圖，說明藉由第1圖之實施形態之叢集系統之動作。第5圖是表示藉由第1實施形態之叢集之特徵量集合作成部1之動作例的流程圖，第6圖是表示分類對象資料之叢集之動作例的流程圖。 Next, the operation of the cluster system according to the embodiment of Fig. 1 will be described with reference to Figs. 5 and 6 . Fig. 5 is a flowchart showing an operation example of the feature quantity set cooperation unit 1 of the cluster according to the first embodiment, and Fig. 6 is a flowchart showing an operation example of the cluster of classification target data.

在以下之說明中，例如分類對象資料為在玻璃物品具有刮痕之特徵量的集合時，則自畫像處理或測量結果取得當作該特徵量之「a：刮痕(scratch)之長度」、「b：刮痕之面積」、「c：刮痕之寬度」、「d：含有刮痕部份之特定區域之透過率」、「e：含有刮痕之特定區域之反射率」。因此，特徵量之集合(以下，為特徵量集合)則成為{a、b、c、d、e}。再者，在本實施形態中，將叢集化所使用之距離當作使用規格化之特徵量的馬哈朗諾比斯距離而予以算出。在此，本實施形態中之上述玻璃物品可舉出板玻璃或顯示器用玻璃基板以作為一例。 In the following description, for example, when the classification target data is a collection of the feature amount of the scratch on the glass article, "a: the length of the scratch" which is the feature amount is obtained from the self-image processing or the measurement result, "b: the area of the scratch", "c: the width of the scratch", "d: the transmittance of the specific region containing the scratched portion", and "e: the reflectance of the specific region containing the scratch". Therefore, the set of feature quantities (hereinafter, the feature quantity set) becomes {a, b, c, d, e}. Furthermore, in the present embodiment, the distance used for the clustering is calculated as the Mahalanobis distance using the normalized feature amount. Here, the glass article in the present embodiment may be, for example, a plate glass or a glass substrate for a display.

A. Feature set integration process (corresponding to the flowchart of Figure 5)

使用者檢測出在玻璃上具有刮痕，攝影該畫像而取得畫像資料，並且藉由畫像處理執行自該畫像資料抽出刮痕部份之長度測定等之特徵量，收集由上述特徵量之集合所構成之特徵量資料。然後，對於使用者欲以刮痕之發生原因或形狀等來分類之各叢集，根據事先已知之發生原因或形狀之資訊，將特徵量資料分成學習資料，當作各叢集之學習資料之母集團，自無圖式之處理終端對應於叢集之識別資訊，記憶至叢集資料庫5(步驟S1)。 The user detects that there is a scratch on the glass, and captures the image to obtain the image data, and performs image processing to extract the feature amount such as the length of the scratched portion from the image data, and collects the feature amount. The characteristic quantity data of the composition. Then, for each cluster to which the user wants to classify the cause or shape of the scratches, the feature amount data is divided into learning materials according to the information of the cause or shape known in advance, and is regarded as the parent group of the learning materials of the clusters. The processing terminal from the non-graphic processing corresponds to the identification information of the cluster, and is memorized to the cluster database 5 (step S1).

接著，特徵量集合作成部1當自上述處理終端對各叢集輸入生成特徵量集合之控制命令時，則自叢集資料庫5對應各叢集之識別資訊，讀取學習資料之母集團。 Next, when the feature amount set cooperation unit 1 inputs a control command for generating a feature amount set from each cluster from the cluster, the cluster data library 5 reads the identification information of each cluster and reads the parent group of the learning materials.

然後，特徵量集合作成部1對每個叢集算出叢集內母集團中之各特徵量之平均值及標準偏差，使用該平均值及標準偏差，自(1)式算出各學習資料中之被規格化的特徵量。 Then, the feature quantity set cooperation unit 1 calculates the average value and the standard deviation of each feature quantity in the parent group in the cluster for each cluster, and calculates the specifications in each learning material from the formula (1) using the average value and the standard deviation. The amount of features.

接著，特徵量集合作成部1是對每個特徵量集合所含之特徵量之所有組合的每個特徵量，藉由式(3)算出判別基準值λ。 Next, the feature amount set cooperation unit 1 is for each feature amount of all combinations of the feature amounts included in each feature amount set, and the discrimination reference value λ is calculated by the equation (3).

此時，特徵量集合作成部1是對每個叢集使用叢集內母集團之規格化的特徵量，算出由對應於各特徵量集合之特徵量所構成之向量之平均值(重心向量)μ_i，由對應於叢集內母集團中之特徵量集合之特徵量所構成之學習資料之向量之標轉偏差σ_i，使用叢集外母集團之規格化的特徵量，算出由對應於各特徵量集合之特徵量所構成之向量之平均值(重心向量)μ_o，和由對應於叢集外母集團中之特徵量集合的特徵量所構成之學習資料之向量之標準偏差σ_o，和全學習資料數中之之叢集內母集團之學習資料數之比率ω_i，和全學習資料數中之叢集外母集團之學習資料數之比率ω_o。 In this case, the feature quantity set cooperation unit 1 calculates the normalized feature quantity of the parent group in the cluster for each cluster, and calculates the average value (center of gravity vector) μ _i of the vectors composed of the feature amounts corresponding to the respective feature amount sets. The normalized deviation σ _i of the vector of the learning material formed by the feature quantity corresponding to the feature quantity set in the parent group in the cluster, and the normalized feature quantity of the cluster outer group is used to calculate the corresponding feature quantity set The average value (center of gravity vector) μ _{o of the} vector formed by the feature quantity, and the standard deviation σ _o of the vector of the learning material composed of the feature quantity corresponding to the feature quantity set in the cluster outer group, and the full learning data The ratio ω _{i of the} number of learning materials of the parent group within the cluster, and the ratio of the number of learning materials of the cluster of the parent group in the total number of learning materials ω _o .

然後，特徵量集合作成部1使用上述重心向量μ_i、μ_o、標準偏差σ_i、σ_o、比率ω_i、ω_o，藉由式(3)，對每個叢集，對於特徵量集合的所有組合之特徵量集合，計算判別每個叢集與其他叢集之距離的判別基準值λ。 Then, the feature quantity set cooperation unit 1 uses the above-described barycenter vectors μ _i , μ _o , standard deviation σ _i , σ _o , ratio ω _i , ω _o , by equation (3), for each cluster, for the feature quantity set A set of feature quantities of all combinations, and a discriminant reference value λ for discriminating the distance between each cluster and other clusters is calculated.

當所有判別基準值λ之計算結束時，特徵量集合作成部1對每個叢集以由大至小之順序排列判別基準值λ，檢測出對應於最大之判別基準值λ之特徵量集合作為判定歸屬於各叢集時，表示距離計算中所使用之特徵量之組合之集合的特徵量集合(步驟S2)。 When the calculation of all the discrimination reference values λ is completed, the feature amount set cooperation unit 1 arranges the discrimination reference value λ in descending order for each cluster, and detects the feature amount set corresponding to the largest discrimination reference value λ as a determination. When belonging to each cluster, a feature amount set indicating a set of combinations of feature amounts used in the distance calculation (step S2).

接著，特徵量集合作成部1為了在距離計算部3的距離計算中使用，算出對應於各特徵量集合之特徵量間之相關係數R，和各叢集內母集團中之學習資料之特徵量之平均值avg.(i)及標準偏差std.(i)(步驟S3)。 Next, the feature amount set cooperation unit 1 calculates the correlation coefficient R between the feature amounts corresponding to each feature amount set and the feature amount of the learning material in the parent group in each cluster for use in the distance calculation by the distance calculation unit 3. The average value avg. (i) and the standard deviation std. (i) (step S3).

接著，特徵量集合作成部1自上述判別基準值λ算出補正係數λ^-(1/2)。該補正係數λ^-(1/2)為取得各特徵量集合間之標準化。因依叢集不同，與其他叢集之距離會產生偏差，故為了提升分類精度，必須執行特徵量集合間之標準化。再者，並不是以λ^-(1/2)當作補正係數，即使使用log(λ)，或是單純使用(μ_o-μ_i)亦可，若為含有λ之函數，可執行特徵量集合間之標準化者時即使為任一者亦可。 Next, the feature quantity set cooperation unit 1 calculates the correction coefficient λ ^{- (1/2)} from the above-described discrimination reference value λ . The correction coefficient λ ^{- (1/2)} is obtained by standardization between sets of feature quantities. Since the distance from other clusters varies depending on the cluster, in order to improve the classification accuracy, it is necessary to perform standardization between the feature quantity sets. Furthermore, not based on λ ^{- (1/2)} as the correction coefficients, even if the log (λ), or simply using (μ _o -μ _i) may, if it is a function of [lambda] containing executable feature amount Even if you are a standardizer between sets, you can do it.

再者，在上述(3)式中，於算出對象叢集外母集團之特徵量集合中之重心向量μ_o時，選擇以下3個種類中之任一者以算出對象叢集外母集團中之學習資料。 When Further, in (3) above, in the feature quantity calculation target cluster outside the parent group of the set of centroid vector μ _o, select any of the following three categories of objects is calculated in one cluster in the parent group of the outer Learning data.

a.全學習資料中之對象叢集外母集團之所有學習資料 a. All the learning materials of the external parent group in the whole learning materials

b.上述對象叢集外母集團中對應於分類之目的的特定學習資料 b. The specific learning materials of the above-mentioned object cluster external parent group corresponding to the purpose of classification

c.特徵量選擇所使用之學習資料中之對象叢集外母集團之學習資料 c. Feature data selection of the learning materials used in the learning materials of the external parent group learning materials

在此，b.之分類目的是區別成明確與受到注目之叢集有所差異，學習資料是使用欲賦予該差異之其他叢集所包含之學習資料。 Here, the classification purpose of b. is to distinguish it from the cluster that is clearly noticed, and the learning material is the learning material included in the other clusters to which the difference is to be given.

然後，特徵量集合作成部1對應於每個叢集之識別資訊，將特徵量集合；對應於特徵量集合之補正係數，在本實施形態中為λ^-(1/2)之值；逆矩陣R^-1；平均值avg.(i)；和標準偏差std.(i)當作距離計算資料記憶至特徵量集合記憶部4(步驟4)。 Then, the feature quantity set cooperation unit 1 corresponds to the identification information of each cluster, and sets the feature quantity; the correction coefficient corresponding to the feature quantity set is the value of λ ^-(1/2) in the present embodiment; the inverse matrix R ^-1 ; the average value avg. (i); and the standard deviation std. (i) are stored as the distance calculation data in the feature quantity set storage unit 4 (step 4).

B. Cluster processing (corresponding to the flowchart of Figure 6)

當輸入分類對象資料時，特徵量抽出部2藉由各叢集之識別訊號，自特徵量集合記憶部4讀出對應每個叢集之特徵量集合。 When the classification target data is input, the feature amount extracting section 2 by each cluster The identification signal is read from the feature amount set storage unit 4 for the feature amount set corresponding to each cluster.

然後，特徵量抽出部2對應於所讀出之特徵量集合中特徵量之類別，自分類對象資料將特徵量抽出至每個叢集，對應於叢集之各個識別資訊，將所抽出之特徵量記憶於內部記憶部(步驟S11)。 Then, the feature quantity extracting unit 2 extracts the feature quantity from each of the classified object data to each cluster based on the type of the feature quantity in the read feature quantity set, and extracts the extracted feature quantity memory corresponding to each piece of identification information of the cluster. The internal memory unit (step S11).

接著，距離計算部3是自特徵量集合記憶部4讀出對應於該特徵量之平均值avg.(i)和標準偏差std(i)，藉由執行上述式(2)之運算，將自分類對象資料特徵量予以規格化，使記憶於內部記憶部之特徵量置換成規格化之特徵量。 Next, the distance calculating unit 3 reads out the average value avg.(i) and the standard deviation std(i) corresponding to the feature amount from the feature amount set storage unit 4, and performs the operation of the above formula (2). The classification target data feature amount is normalized, and the feature quantity memorized in the internal memory unit is replaced with the normalized feature quantity.

然後，距離計算部3生成由如上述般所取得之V(i)之要素所構成之矩陣V，計算該矩陣V之轉置矩陣V^T，藉由式(3)，順序計算分類對象資料和各叢集之間的馬哈朗諾比斯距離，對應於各叢集之識別資訊，而記憶於內部記憶部(步驟S12)。 Then, the distance calculating unit 3 generates a matrix V composed of the elements of V(i) obtained as described above, calculates a transposed matrix V ^T of the matrix V, and sequentially calculates the classification target data and the equation (3). The Mahalanobis distance between the clusters is stored in the internal memory corresponding to the identification information of each cluster (step S12).

接著，距離計算部3是對計算結果之上述馬哈朗諾比斯距離，乘算對應於特徵量集合之補正係數λ^-(1/2)，求出補正係數，分別與馬哈朗諾比斯距離置換(步驟S13)。再者，於乘算補正係數之時，即使於計算馬哈朗諾比斯距離之log或是平方根之後亦可。 Next, the distance calculating unit 3 calculates the Mahalanobis distance of the calculation result, multiplies the correction coefficient λ ^-(1/2) corresponding to the feature quantity set, and obtains the correction coefficient, respectively, and the Mahalano ratio. The distance is replaced (step S13). Furthermore, when multiplying the correction coefficient, even after calculating the log or square root of the Mahalanobis distance, it is also possible.

然後，距離計算部3是比較內部記憶部中之各叢集間之補正距離(步驟S14)，檢測出最小之補正距離，將對應於其補正距離之識別資訊之叢集，當作分類對象資料所屬之叢集，對叢集資料庫5，對應分類目標之叢集之識別資訊，記憶所分類之分類對象資料(步驟S15)。 Then, the distance calculating unit 3 compares the correction distances between the clusters in the internal memory unit (step S14), detects the smallest correction distance, and uses the cluster of the identification information corresponding to the correction distance as the classification target data. The cluster of genus, for the cluster database 5, corresponding to the identification information of the cluster of classification targets, and memorizes the classified object data classified (step S15).

[Second Embodiment]

上述第1實施形態雖然將執行叢集化之時所使用之特徵量集合，對每個叢集以1種類予以說明，但是即使如以下所說明之第2實施形態般，對每個叢集設定多個特徵量，運算對應於各個特徵量集合之馬哈朗諾比斯距離，算出補正距離，以由小至大之順序重新排列該補正距離，藉由上位之特定順位以內之補正距離，因應事先所設定之規則，作為分類對象所屬之叢集亦可。 In the first embodiment, the feature amount set used when the clustering is performed is described as one type for each cluster. However, as in the second embodiment described below, a plurality of features are set for each cluster. The amount is calculated by the Mahalanobis distance corresponding to each feature quantity set, and the correction distance is calculated, and the correction distance is rearranged in the order of small to large, and the correction distance within a specific order of the upper position is set in advance. The rules can also be used as a cluster to which the classification object belongs.

即是，本實施形態中之距離計算部3，是藉由表示根據在每個特徵量集合所取得之分類對象資料和各叢集之距離中的該距離之順位設定之分類對象資料對各個叢集之分類基準的規則模式，來檢測出分類對象資料屬於哪一個叢集。 In other words, the distance calculating unit 3 according to the present embodiment is configured to represent each cluster by the classification target data set based on the rank of the distance between the classification target data acquired in each feature amount set and the distance of each cluster. A rule pattern of the classification criteria to detect which cluster the classification object data belongs to.

以下，第2實施形態之構成是與第1圖所示之第1實施形態相同，對各構成賦予相同之符號，使用第7圖僅說明各構成中與第1實施形態不同之動作。在第2實施形態中，有自學習資料設定上述規則模式之處理。第7圖是表示相對於設定規則模式之距離順位的模式學習之動作例的流程圖。第8圖及第9圖是表示第2實施形態中之叢集化之動作例的流程圖。 The configuration of the second embodiment is the same as that of the first embodiment shown in Fig. 1, and the same reference numerals are given to the respective configurations. Only the operations different from the first embodiment in the respective configurations will be described with reference to Fig. 7. In the second embodiment, there is a process of setting the rule mode by the self-learning material. Fig. 7 is a flow chart showing an example of the operation of mode learning with respect to the distance of the set rule mode. Fig. 8 and Fig. 9 are flowcharts showing an example of the operation of the clustering in the second embodiment.

再者，在第1實施形態中，作成特徵量集合之時，特徵量集合作成部1對每個叢集，對於當作特徵量之組合的多個特徵量集合算出判別基準值λ，將對應於多個所求出之判別基準值λ之最大值的特徵量集合設定為各叢集之特徵量集合。 Furthermore, in the first embodiment, when the feature quantity set is created, The levy set cooperation unit 1 calculates a discrimination reference value λ for each of a plurality of feature quantity sets that are a combination of feature quantities, and sets a feature quantity set corresponding to the maximum value of the plurality of obtained discrimination reference values λ. A collection of feature quantities for each cluster.

另外，在第2實施形態中，特徵量集合作成部1是對在每個叢集，對於其他叢集之1個或多個組合或者其他所有叢集，分別設定對應於特徵量之組合數之特徵量集合之最大值，依此求出多個判別基準值λ，對在每個叢集設定用以與其他叢集分離之多個特徵量集合。 Further, in the second embodiment, the feature amount set cooperation unit 1 sets the feature amount set corresponding to the combination number of the feature amounts for each cluster, one or a plurality of combinations of other clusters, or all other clusters. The maximum value is obtained by obtaining a plurality of discrimination reference values λ, and a plurality of feature amount sets for separating from other clusters are set for each cluster.

然後，特徵量集合作成部1對每個特徵量集合求出距離計算資料，對應於叢集之識別資訊，使多個特徵量集合和各特徵量集合之距離計算資料記憶於特徵量集合記憶部4。 Then, the feature quantity set cooperation unit 1 obtains distance calculation data for each feature quantity set, and correspondingly sets the plurality of feature quantity sets and the distance calculation data of each feature quantity set in the feature quantity set storage unit 4 corresponding to the cluster identification information. .

然後，第7圖中，當學習資料被輸入時，特徵量抽出部2藉由各叢集之識別訊號，自特徵量集合記憶部4讀出對應於每個叢集之多個特徵量集合。 Then, in the seventh figure, when the learning material is input, the feature amount extracting unit 2 reads out a plurality of feature amount sets corresponding to each cluster from the feature amount set storage unit 4 by the identification signals of the clusters.

然後，特徵量抽出部2對應於讀出之各特徵量集合中之特徵量的類別，自學習資料抽出特徵量至每個叢集，使分別對應於叢集之各識別資訊，將所抽出之特徵量對每個特徵量集合記憶於內部記憶部(步驟S21)。 Then, the feature amount extracting unit 2 extracts the feature amount from the learning material to each cluster in accordance with the category of the feature amount in each of the read feature quantity sets, so that each of the identification information corresponding to the cluster respectively extracts the extracted feature amount. Each set of feature quantities is stored in the internal memory (step S21).

接著，距離計算部3是自特徵量集合記憶部4，對每個特徵量集合讀出對應於該特徵量之平均值avg.(i)和標準偏差std.(i)，藉由執行上述式(2)之運算，使自學習資料所抽出之各特徵量予以規格化，並將記憶於內部記憶部之特徵量置換成規格化之特徵量。 Next, the distance calculating unit 3 is the self-featured set storage unit 4, and reads the average value avg.(i) and the standard deviation std.(i) corresponding to the feature amount for each feature amount set by executing the above formula. (2) The calculation makes the feature quantities extracted from the self-learning materials normalized and will be memorized internally. The feature quantity of the memory unit is replaced with a normalized feature quantity.

然後，距離計算部3生成由如上述般所取得之V(i)之要素所構成之矩陣V，計算該矩陣V之轉置矩陣V^T，藉由式(3)，順序計算學習資料和各叢集之間的馬哈朗諾比斯距離，對應於各叢集之識別資訊，對每個特徵量集合記憶於內部記憶部(步驟S22)。 Then, the distance calculating unit 3 generates a matrix V composed of the elements of V(i) obtained as described above, calculates a transposed matrix V ^T of the matrix V, and sequentially calculates learning materials and each by equation (3). The Mahalanobis distance between the clusters corresponds to the identification information of each cluster, and each feature amount set is memorized in the internal memory (step S22).

接著，距離計算部3是對計算結果之上述馬哈朗諾比斯距離，乘算對應於特徵量集合之補正係數λ^-(1/2)，求出補正係數，並分別與馬哈朗諾比斯距離置換(步驟S23)。 Next, the distance calculating unit 3 calculates the Mahalanobis distance of the calculation result, multiplies the correction coefficient λ ^-(1/2) corresponding to the feature quantity set, and obtains the correction coefficient, and respectively, and the Mahalano Biss distance replacement (step S23).

然後，距離計算部3是以由小至大之順序重新排列內部記憶部中之各叢集間的補正距離(重新排列成越小的補正距離越上位的位次)，即是以與分類對象資料之補正距離為小的叢集之識別資訊成為上位之順序予以排列(步驟S24)。 Then, the distance calculating unit 3 rearranges the correction distances between the clusters in the internal memory unit in the order of small to large (rearranged to the smaller the correction distance, the higher the ranking), that is, the classification data. The correction distance is such that the identification information of the small clusters is ranked in the upper order (step S24).

接著，距離計算部3檢測出對應於從小之一方(上位)至第n位之各補正距離的叢集之識別資訊，計數其n個所含之每個叢集之識別資訊之數量，即是對各叢集執行投票處理。 Next, the distance calculating unit 3 detects the identification information of the cluster corresponding to each of the corrected distances from one of the small (upper) to the nth, and counts the number of pieces of identification information of each of the clusters included in the n, that is, The cluster performs voting processing.

然後，距離計算部3是檢測出各學習資料之各叢集之識別資訊之計數量之模式，與相同叢集所含之學習資料共同之規則模式。 Then, the distance calculation unit 3 is a pattern in which the pattern of the identification amount of the identification information of each cluster of each learning material is detected, and the learning data included in the same cluster is shared.

例如，當將n設為10時，在叢集B之學習資料的情況下，檢測出叢集A為5個，叢集B為3個，叢集C為2 個之計數量之模式時，則將此設為規則R1。 For example, when n is set to 10, in the case of the learning material of the cluster B, it is detected that the cluster A is five, the cluster B is three, and the cluster C is two. When the mode of counting is set, this is set to rule R1.

再者，當在為叢集C之學習資料之情況下，檢測出3個叢集C時，即使叢集A為7個，叢集B為0個，也不一定為與叢集C，如此之情況共同時，若叢集C之計數量若為3個以上，則亦可與其他叢集之計數量無關係地視為叢集C，將此設為規則R2。 Furthermore, when three clusters C are detected in the case of cluster C learning data, even if cluster A is 7 and cluster B is 0, it is not necessarily cluster C, and in this case, If the count amount of the cluster C is three or more, the cluster C may be regarded as the rule R2 irrespective of the count amount of the other clusters.

再者，為叢集A之學習資料，叢集A為從上位佔據第1位及第2位的排列模式時，即使叢集B之計數量為8個，亦與其他叢集之計數量無關係地視為叢集A，將此設為規則R3。 Furthermore, for the learning data of the cluster A, when the cluster A is in the arrangement pattern of occupying the first and second bits from the upper position, even if the count amount of the cluster B is eight, it is regarded as irrelevant to the counts of other clusters. Cluster A, this is set to rule R3.

如上述般，檢測出分類成相同叢集之各學習資料所具有之各叢集之計數量的規則性，對每個叢集之識別資訊中，當作模式列表記憶於內部。在此，規則即使在各叢集設定1個亦可，設定多個亦可。再者，在上述說明中，雖然距離計算部3抽出規則模式，但是即使使用者為了改變對各叢集之分類精度，任意設定計數量或是排列的規則模式亦可。 As described above, the regularity of the count amount of each cluster included in each learning material classified into the same cluster is detected, and the identification information for each cluster is stored as a pattern list. Here, even if one rule is set in each cluster, a plurality of rules may be set. Further, in the above description, although the distance calculation unit 3 extracts the rule mode, the user may arbitrarily set the count amount or the arrangement rule pattern in order to change the classification accuracy for each cluster.

依叢集不同，有時也會與其它叢集在特徵資訊之特性上相似，也有由多個叢集之關聯性，即是各叢集之計數量或是來自上位之排列之模式的對象模式，來執行分類對象資料之分類之一方的精度比較高之情形，本實施形態則補充此點。 Depending on the cluster, it is sometimes similar to other clusters in the characteristics of feature information, and there are also correlations between multiple clusters, that is, the count of each cluster or the object pattern from the pattern of the upper order to perform classification. In the case where the accuracy of one of the classifications of the object data is relatively high, this embodiment supplements this point.

接著，使用第8圖之流程圖，說明使用上述列表所記述之規則的第2實施形態之叢集化之處理。 Next, the process of clustering using the second embodiment of the rule described in the above list will be described using the flowchart of Fig. 8.

當分類對象資料被輸入時，特徵量抽出部2藉由各叢集之識別訊號，將對應於每個叢集之多個特徵量集合，自特徵量集合記憶部4讀出。 When the classification target data is input, the feature amount extracting unit 2 reads out a plurality of feature amount sets corresponding to each cluster from the feature amount set storage unit 4 by the identification signals of the clusters.

然後，特徵量抽出部2是對應所讀出之各特徵量集合中之特徵量的類別，自分類對象資料抽出特徵量至每個叢集，分別對應於叢集之識別資訊，將所抽出之特徵量對每個特徵量集合記憶於內部記憶部(步驟S31)。 Then, the feature quantity extracting unit 2 is a category corresponding to the feature quantity in each of the read feature quantity sets, and extracts the feature quantity from the classification target data to each cluster, respectively corresponding to the identification information of the cluster, and extracts the feature quantity extracted. Each set of feature quantities is memorized in the internal memory (step S31).

接著，距離計算部3是自特徵量集合記憶部4，對每個特徵量集合讀出對應於該特徵量之平均值avg.(i)和標準偏差std.(i)，藉由執行上述式(2)之運算，使自分類對象資料所抽出之各特徵量予以規格化，並將記憶於內部記憶部之特徵量置換成規格化之特徵量。 Next, the distance calculating unit 3 is the self-featured set storage unit 4, and reads the average value avg.(i) and the standard deviation std.(i) corresponding to the feature amount for each feature amount set by executing the above formula. (2) The calculation is performed to normalize each feature amount extracted from the classification target data, and replace the feature quantity stored in the internal memory unit with the normalized feature quantity.

然後，距離計算部3生成由如上述般所取得之V(i)之要素所構成之矩陣V，計算該矩陣V之轉置矩陣V^T，藉由式(3)，順序計算學習資料和各叢集之間的馬哈朗諾比斯距離，對應於各叢集之識別資訊，對每個特徵量集合記憶於內部記憶部(步驟S32)。 Then, the distance calculating unit 3 generates a matrix V composed of the elements of V(i) obtained as described above, calculates a transposed matrix V ^T of the matrix V, and sequentially calculates learning materials and each by equation (3). The Mahalanobis distance between the clusters corresponds to the identification information of each cluster, and each feature amount set is memorized in the internal memory (step S32).

接著，距離計算部3是對計算結果之上述馬哈朗諾比斯距離，乘算對應於特徵量集合之補正係數λ^-(1/2)，求出補正係數，分別與馬哈朗諾比斯距離置換(步驟S33)。 Next, the distance calculating unit 3 calculates the Mahalanobis distance of the calculation result, multiplies the correction coefficient λ ^-(1/2) corresponding to the feature quantity set, and obtains the correction coefficient, respectively, and the Mahalano ratio. The distance is replaced (step S33).

然後，距離計算部3是以由小至大之順序重新排列內部記憶部中之各叢集間的補正距離，即是以分類對象資料之補正距離為小的叢集之識別資訊成為上位之順序予以排列(步驟S34)。 Then, the distance calculating unit 3 rearranges the correction distances between the clusters in the internal memory unit in the order of small to large, that is, the identification information in which the clustering information of the classification target data is smaller is arranged in the upper order. (Step S34).

重新排列之後，距離計算部3檢測出與從小之一方(上位)至第n位之各補正距離對應的叢集之識別資訊，計數其n個所含之每個叢集之識別資訊之數量，即是對各叢集執行投票處理。 After the rearrangement, the distance calculating unit 3 detects the identification information of the cluster corresponding to each of the corrected distances from one of the small (upper) to the nth, and counts the number of pieces of identification information of each of the clusters included in the n, that is, The voting process is performed on each cluster.

接著，距離計算部3執行各分類對象資料之上位n個中之對各叢集之計數量的模式(或是排列之模式)，是否存在於記憶於內部之列表的對照處理(步驟S35)。 Next, the distance calculation unit 3 executes a mode (or a mode of arrangement) in which the count amounts of the clusters among the upper n pieces of the classification target data are present, and whether or not there is a comparison process stored in the internal list (step S35).

然後，距離計算部3當檢測出上述對照之結果，與分類對象資料之對象模式一致之規則模式記述於列表時，則判定該分類對象資料屬於對應於其一致之規則的識別資訊之叢集，將分類對象資料分類在該叢集(步驟S36)。 When the distance calculation unit 3 detects the result of the comparison and the rule pattern matching the target pattern of the classification target data is described in the list, the distance calculation unit 3 determines that the classification target data belongs to the cluster of the identification information corresponding to the matching rule. The classification object data is classified in the cluster (step S36).

再者，使用第9圖之流程圖，針對使用記述於上述列表之規則的第2實施形態之其他叢集化之處理予以說明。 Further, the processing of the other clustering using the second embodiment of the rules described in the above list will be described using the flowchart of FIG.

在該第9圖所示之其他叢集化之處理中，步驟S31至步驟S35之處理，是與第8圖所示之處理相同，距離計算部3如同在步驟S35所述般，執行自記憶於列表之規則模式，與分類對象資料之對象模式的對照處理。 In the other clustering processing shown in FIG. 9, the processing of steps S31 to S35 is the same as the processing shown in FIG. 8, and the distance calculating unit 3 performs self-memory as described in step S35. The rule mode of the list is compared with the object mode of the classified object data.

然後，距離計算部3，在上述對照結果，檢測出是否檢索到與上述對象模式一致之規則模式，於檢測出檢索到一致之規則模式時，則將處理移至步驟S47，另外，當檢測出無檢索到一致之規則模式時，則將處理移行至步驟S48(步驟S46)。 Then, the distance calculating unit 3 detects whether or not the rule pattern matching the target pattern is searched for, and when detecting the rule pattern in which the matching is found, the process proceeds to step S47, and when the detection is performed, When the matching rule mode is not retrieved, the process proceeds to step S48 (step S46).

當檢測出檢索到一致之規則模式時，距離計算部3則判定該分類對象資料屬於對應於其一致之規則的識別資訊之叢集，將分類對象資料分類成該叢集，對叢集資料庫5，對應於分類目標之叢集之識別資訊，記憶所分類之分類對象資料(步驟S47)。 When detecting the rule pattern in which the agreement is found, the distance calculation unit 3 determines that the classification target data belongs to the identification information corresponding to the rule that is consistent with it. The clusters are classified into the clusters, and the cluster database 5, corresponding to the identification information of the clusters of the classification targets, memorizes the classified object data (step S47).

另外，當檢測出無檢索到一致之規則模式時，距離計算部3則檢測出計數量，即是投票數最多之識別資訊，分類對應於該識別資訊之叢集的分類對象資料。 Further, when it is detected that there is no rule pattern in which the search is unmatched, the distance calculation unit 3 detects the count amount, that is, the identification information having the largest number of votes, and classifies the classification target data corresponding to the cluster of the identification information.

然後，距離計算部3是對叢集資料庫5，對應於歸屬目標之叢集的識別資訊，記憶所分類的分類對象資料(步驟S48)。 Then, the distance calculation unit 3 stores the classified classification target data for the cluster database 5 corresponding to the identification information of the cluster of the attribution target (step S48).

[Third embodiment]

上述第2實施形態雖然是以準備與所計算的分類對象資料之各叢集的距離從小(類似性為大)之一方至上位n個的規則模式之列表，藉由是否對應於存在於該列表之規則模式，執行各分類對象資料之叢集化之處理而予以說明，但是即使如以下第3實施形態般，對每個叢集設定多個特徵量集合，運算對應於各個特徵量集合之馬哈朗諾比斯距離，算出補正距離，將上位之特定順位以內之補正距離為多之叢集，當作分類對象資料所屬之叢集亦可。 In the second embodiment, a list of rule patterns from the one of the clusters of the calculated classification target data from the small (similarity is large) to the upper n is prepared by whether or not it corresponds to the existence of the list. The rule mode is described by performing the process of clustering the classification target data. However, as in the third embodiment below, a plurality of feature amount sets are set for each cluster, and the Mahalano corresponding to each feature amount set is calculated. The distance of the distance is calculated, and the correction distance within the specific order of the upper rank is calculated as the cluster to which the classification target data belongs.

以下，第3實施形態之構成是與第1圖所示之第1及第2實施形態相同，對各構成賦予將相同符號，使用第10圖僅說明各構成中與第2實施形態不同之動作。在第3實施形態中，並不是自學習資料設定上述規則之處理，而是直接執行第9圖中之步驟S48。第10圖是表示第3實施形態中之叢集的動作例之流程圖。 In the following, the configuration of the third embodiment is the same as that of the first and second embodiments shown in Fig. 1, and the same reference numerals are given to the respective configurations. Only the operations different from the second embodiment in the respective configurations will be described using FIG. . In the third embodiment, the processing of setting the above rule is not performed on the self-learning material, and step S48 in Fig. 9 is directly executed. Figure 10 is a diagram showing the third implementation A flow chart of an example of the operation of a cluster in a pattern.

在該第10圖所示之其他叢集化之處理中，從步驟S31至步驟S34之處理，是與第8圖所示之處理相同，距離計算部3如先前所述，在步驟S34中，以由小至大之順序，重新排列內部記憶部中之各叢集間之補正距離，即是排列成分類對象資料之補正距離為小之叢集資訊成為上位之順序(步驟S34)。 In the other clustering processing shown in FIG. 10, the processing from step S31 to step S34 is the same as the processing shown in Fig. 8, and the distance calculating unit 3, as described earlier, in step S34, In the order from small to large, the correction distance between the clusters in the internal memory portion is rearranged, that is, the cluster information in which the correction distance of the classification target data is small is the upper order (step S34).

接著，距離計算部3檢測出與從小(上位)之一方至第n位為止之各補正距離對應之叢集的識別資訊，計數其n個所含之每個叢集之識別資訊之數量，即是對各叢集執行投票處理(步驟S55)。 Next, the distance calculating unit 3 detects the identification information of the cluster corresponding to each of the corrected distances from one of the small (upper) to the nth, and counts the number of pieces of identification information of each of the clusters included in the n, that is, Each cluster performs voting processing (step S55).

然後，距離計算部3是在投票結果中，檢測最多計數值(投票數)之識別資訊，將對應於該識別資訊之叢集，當作分類對象資料所屬之叢集，對叢集資料庫5，對應於歸屬目標之叢集的識別資訊，記憶分類的分類對象資料(步驟S56)。 Then, the distance calculating unit 3 detects the identification information of the most count value (the number of votes) in the voting result, and regards the cluster corresponding to the identification information as the cluster to which the classification target data belongs, and corresponds to the cluster database 5 The identification information of the cluster belonging to the target, and the classified object data of the memory classification (step S56).

再者，在距離計算部3對每個識別資訊設定使用者用以預先篩選掉之投票數的臨界值，投票數最多之識別資訊之投票數未達到該臨界值之時，即使執行也不屬於任何叢集之處理亦可。 Furthermore, the distance calculation unit 3 sets a threshold value for the number of votes that the user has pre-screened for each piece of identification information, and when the number of votes of the identification information having the highest number of votes does not reach the threshold value, even if the execution does not belong to Any cluster can be processed as well.

例如，對叢集A、B、C之3個叢集，將分類對象資料予以分類之時，相對於叢集A之識別資訊的投票數為5個，相對於叢集B之識別資訊的投票數為3個，相對於叢集C之投票數為2個時，最多投票數之識別資訊檢測出叢集A和距離計算部3。 For example, when the classification object data is classified into three clusters of clusters A, B, and C, the number of votes relative to the identification information of cluster A is five, and the number of votes relative to the identification information of cluster B is three. When the number of votes relative to cluster C is 2, the identification information of the most votes is detected. Set A and distance calculation unit 3.

但是，當將對叢集A之上述臨界值設定為6個時，因相對於叢集A之識別資訊的投票數未達到臨界值，故距離計算部3判定不屬於任何叢集。 However, when the above-described threshold value of the cluster A is set to six, since the number of votes for the identification information with respect to the cluster A does not reach the critical value, the distance calculating unit 3 determines that it does not belong to any cluster.

依此，在對於特徵量與其他叢集僅有些許差之叢集的叢集化中，則能夠提高分類對象資料對叢集的分類處理之信賴性。 Accordingly, in the clustering of clusters in which the feature quantity is only slightly different from other clusters, the reliability of the classification processing of the classification object data to the clustering can be improved.

[Method of transforming feature quantity]

雖然期待各特徵量之母集團為正規分布而執行叢集化，但考慮到根據特徵量之種類(面積、長度等)不同則有不是正規分布，且母集團具有偏斜分布之情形，此時分類對象資料和各叢集之間之距離計算，即是判定分類對象資料和各叢集之類似性時之精度下降。 Although the parent group of each feature quantity is expected to perform clustering for the normal distribution, it is considered that there is a case where the parent group has a skew distribution depending on the type (area, length, etc.) of the feature quantity, and the classification is performed at this time. The calculation of the distance between the object data and each cluster is a decrease in the accuracy of the classification object data and the similarity of each cluster.

因此，依特徵量不同，需要執行藉由特定方法來變換母集團之特徵量，使接近於正規分布而提升類似性之判定精度。 Therefore, depending on the feature quantity, it is necessary to perform a specific method to change the feature quantity of the parent group, so that the determination accuracy of the similarity is improved close to the normal distribution.

作為變換至該正規分布之變換方法，藉由含有藉由log或平方根(√)、立方根(³√)等之n方根或階乘或是數值計算求出之函數的運算式中之任一者來變換特徵量。 As a conversion method for transforming to the normal distribution, any one of the expressions obtained by the n-square root or factorial or numerical calculation by log or square root (√), cube root ( ³ √), or the like is used. To transform the feature quantity.

以下，使用第11圖說明各特徵量之變換方法的設定處理。第11圖是表示各特徵量之變換方法之設定處理之動作例的流程圖。並且，該變換方法是以叢集所含之各特徵量單位對每個叢集設定。再者，該變換方法之設定是使用屬於各叢集之學習資料而執行。以下之處理雖然以特徵量集合作成部1執行而予以說明，但是即使將對應於該處理之處理部設置於他處亦可。 Hereinafter, the setting process of the conversion method of each feature amount will be described using FIG. Fig. 11 is a flowchart showing an operation example of setting processing of each feature amount conversion method. Moreover, the transformation method is based on the special features contained in the cluster. The levy unit is set for each cluster. Furthermore, the setting of the conversion method is performed using learning materials belonging to each cluster. The following processing will be described with the feature amount set cooperation unit 1 being executed, but the processing unit corresponding to the processing may be provided elsewhere.

特徵量集合作成部1是以分類對象之叢集之識別資訊作為金鑰，自叢集資料庫讀出該叢集所含之學習資料，算出(正規化處理)各學習資料之特徵量(步驟S61)。 The feature amount set cooperation unit 1 uses the identification information of the cluster of classification objects as a key, reads the learning data included in the cluster from the cluster database, and calculates (normalizes) the feature amount of each learning material (step S61).

接著，特徵量集合作成部1藉由使用執行記憶於內部之特徵量變換之運算式中之任一者，運運所讀出之上述各學習資料，執行特徵量之變換(步驟S62)。 Then, the feature amount set cooperation unit 1 performs the conversion of the feature amount by transporting the read learning materials by using any one of the arithmetic expressions for performing the feature amount conversion stored therein (step S62).

當結束所有學習資料之特徵量之變換時，特徵量集合作成部1算出表示以變換處理所取得之分布是否接近於正規分布之評估值(步驟S63)。 When the conversion of the feature amounts of all the learning materials is completed, the feature amount set cooperation unit 1 calculates an evaluation value indicating whether or not the distribution obtained by the conversion processing is close to the normal distribution (step S63).

接著，特徵量集合作成部1進行檢測是否在記憶於內部，即是當作變換方法之事先所設定的所有運算式，已算出評估值，於檢測出在所有運算式算出特徵量變換而得到之分布之評估值時，將處理前進至步驟S65，另外於檢測出利用所有運算式之特徵量的算出尚未結束時，因執行下一個所設定之運算式之處理，故將處理返回至步驟S62(步驟S64)。 Next, the feature amount set cooperation unit 1 detects whether or not it is stored internally, that is, all the arithmetic expressions set in advance as the conversion method, and the estimated value is calculated, and the feature value is calculated by calculating the feature amount in all the arithmetic expressions. When the evaluation value of the distribution is performed, the process proceeds to step S65, and when it is detected that the calculation of the feature amount using all the arithmetic expressions has not yet been completed, the processing of the next set arithmetic expression is executed, so the process returns to step S62 ( Step S64).

當利用所有運算式之特徵量的變換結束時，特徵量集合作成部1檢測出在所設定之運算式中所取得之分布中評估值為最小之分布，即是最接近於正規分布之分布，決定用以作成所檢測出之分布之運算式以作為變換方法，並作為其叢集之特徵量之變換方法設定在內部(步驟S65)。 When the transformation using the feature quantities of all the expressions ends, the feature quantity set cooperation unit 1 detects the distribution in which the evaluation value is the smallest among the distributions obtained in the set expression, that is, the distribution closest to the normal distribution. Determining the expression used to create the detected distribution as a transformation method The method of transforming the feature amount of the cluster is set internally (step S65).

特徵量集合作成部1是對各叢集之每個特徵量執行上述處理，對應於各個叢集之各特徵量而設定變換方法。 The feature amount set cooperation unit 1 performs the above-described processing for each feature amount of each cluster, and sets a conversion method corresponding to each feature amount of each cluster.

接著，使用第12圖說明上述步驟S63中之評估值的計算。第12圖是說明藉由運算式所取得之分布之評估值的處理之動作例的流程圖。 Next, the calculation of the evaluation value in the above step S63 will be described using FIG. Fig. 12 is a flow chart showing an example of the operation of the process of evaluating the evaluation value of the distribution obtained by the arithmetic expression.

特徵量集合作成部1是藉由所設定之運算式變換屬於對象叢集之各學習資料之特徵量(步驟S71)。 The feature amount set cooperation unit 1 converts the feature amounts of the learning materials belonging to the target cluster by the set arithmetic expression (step S71).

變換所有學習資料之特徵量後，特徵量集合作成部1算出以該變換後之特徵量所取得之分布(母集團)之平均值μ及標準偏差σ(步驟S72)。 After the feature quantity of all the learning materials is converted, the feature quantity set cooperation unit 1 calculates the average value μ and the standard deviation σ of the distribution (parent group) obtained by the converted feature quantity (step S72).

然後，特徵量集合作成部1是使用上述母集團之平均值μ和標準偏差σ，由(x-μ)/σ算出z值(1)(步驟S73)。 Then, the feature quantity set cooperation unit 1 calculates the z value (1) from (x - μ) / σ using the average value μ and the standard deviation σ of the parent group (step S73).

接著，特徵量集合作成部1算出上述母集團中之累積確率(步驟S74)。 Next, the feature amount set cooperation unit 1 calculates the cumulative accuracy rate in the parent group (step S74).

算出後、特徵量集合作成部1藉由所求出之母集團中之累積確率，算出之z值(2)以作為標準正規分布之累積分布函數之逆函數之值(步驟S75)。 After the calculation, the feature amount set cooperation unit 1 calculates the z value (2) as the inverse function of the cumulative distribution function of the standard normal distribution by the cumulative reliability ratio in the obtained parent group (step S75).

然後，特徵量集合作成部1是求出特徵量之分布之兩個z值，即是z值(1)及z值(2)之差，即是分布中之兩個z值之誤差(步驟S76)。 Then, the feature quantity set cooperation part 1 is to obtain two z values of the distribution of the feature quantity, that is, the difference between the z value (1) and the z value (2), that is, the error of the two z values in the distribution (step S76).

當求出z值之誤差時，特徵量集合作成部1算出上述兩個z值之誤差和，即是其誤差之總合(自乘和)以當作評估值(步驟S77)。 When the error of the z value is obtained, the feature quantity set cooperation unit 1 calculates the sum of the errors of the above two z values, that is, the sum of the errors (self-multiplication) to be regarded as The value is evaluated (step S77).

上述兩個z值之誤差越小，分布越接近正規分布，若無z值之誤差，則為正規分布，另外，分布離正規分布越遠，誤差越大。 The smaller the error of the above two z values, the closer the distribution is to the normal distribution. If there is no error of z value, it is a normal distribution. In addition, the farther the distribution is from the normal distribution, the larger the error.

接著，於執行第1至第3之實施形態中之叢集之處理前，使用第13圖說明分類對象資料之特徵量。第13圖是分類對象資料之特徵量資料之算出動作例的流程圖。 Next, before performing the processing of the clusters in the first to third embodiments, the feature amount of the classification target data will be described using FIG. Fig. 13 is a flow chart showing an example of the operation of calculating the feature amount data of the classification target data.

距離計算部3對應於對各叢集所設定之特徵量集合而自輸入之分類對象資料抽出識別對象之特徵量，執行所有說明之正規化處理(步驟S81)。 The distance calculation unit 3 extracts the feature amount of the recognition target from the input classification target data in accordance with the feature amount set set for each cluster, and performs all the normalization processing described (step S81).

接著，距離計算部3是藉由對該叢集之特徵量所設定之變換方法(運算式)，變換分類對象資料中用於對分類對象之叢集分類的特徵量(步驟S82)。 Next, the distance calculation unit 3 converts the feature amount for classifying the classification target in the classification target data by the conversion method (calculation formula) set for the feature amount of the cluster (step S82).

然後，距離計算部3是如第1至第3實施形態所記載般，算出與分類對象之叢集的距離(步驟S83)。 Then, the distance calculating unit 3 calculates the distance from the cluster to be classified as described in the first to third embodiments (step S83).

接著，距離計算部3是藉由對應於各叢集之特徵量而所設定之變換方法，對分類對象之所有叢集，變換特徵量，藉由該變換之特徵量，執行檢測是否計算出與叢集之距離，於檢測到對分類對象之所有叢集求出距離之時，則使處理前進至步驟S85，另外，於檢測出殘留有分類對象之叢集時，則將處理返回至步驟S82(步驟S84)。 Next, the distance calculation unit 3 converts the feature quantity to all the clusters of the classification target by the conversion method set corresponding to the feature quantity of each cluster, and performs detection to detect whether or not the cluster is calculated by the feature quantity of the transformation. When it is detected that the distance is obtained for all the clusters of the classification target, the process proceeds to step S85, and when the cluster in which the classification target remains is detected, the process returns to step S82 (step S84).

然後，在第1至第3實施形態之各個中，開始執行自計算距離結束之時點的處理(步驟S85)。 Then, in each of the first to third embodiments, the processing from the point at which the calculation of the distance is completed is started (step S85).

藉由上述理由，在本實施形態中所使用之馬哈朗諾比斯距離中，求出分類對象資料和各叢集之間的距離之時，因期待特徵量為正規分布，故母集團之各特徵量之分布越接近於正規分布，則可以求出與各叢集之間正確之距離(類似性)，可以期待對各叢集分類之精度。 For the above reasons, the Mahalanobi used in the present embodiment When the distance between the classification target data and each cluster is obtained, the expected feature quantity is a normal distribution. Therefore, the distribution of each feature quantity of the parent group is closer to the normal distribution, and the cluster can be obtained. The correct distance (similarity) can be expected to classify the accuracy of each cluster.

Example [Calculation example]

接著，使用第1、第2及第3實施形態之叢集系統，藉由第14圖所示之樣本資料，確認與以往例之分類精度。可知雖然樣本數量少，但不管所使用之特徵量少，亦取得以往例或該以上之正確率。在該第14圖中，作為叢集，對種類1、種2及種類3分別各定義10個學習資料，各學習資料具有特徵量a、b、c、d、e、f、g、h之8個。在該例中，自屬於第14圖所示之各叢集之學習資料決定叢集所使用之特徵量集合，接著，同樣使用學習集合以當作分類對象資料，執行叢集化。 Next, using the cluster system of the first, second, and third embodiments, the classification accuracy of the conventional example is confirmed by the sample data shown in Fig. 14. It can be seen that although the number of samples is small, the accuracy of the conventional example or the above is obtained regardless of the amount of features used. In the fourteenth figure, as the cluster, 10 learning materials are defined for each of the types 1, 2, and 3, and each learning material has the characteristic quantities a, b, c, d, e, f, g, and h. One. In this example, the learning material belonging to each cluster shown in Fig. 14 determines the feature amount set used by the cluster, and then the learning set is also used as the classification target data to perform clustering.

作為計算結果，第15圖為以往之計算手法，使用特徵量a及g作為特徵量之組合，對叢集1至叢集3之第14圖所示之各學習資料，運算馬哈朗諾比斯距離，表示判定結果。在第15圖(a)中，Cluster1之列為與叢集1的馬哈朗諾比斯距離，Cluster2之列為與叢集2的馬哈朗諾比斯距離，Cluster3之列為與叢集3的馬哈朗諾比斯距離。再者，種類之列實際上表示各學習資料所屬之叢集，判定結果表示學習資料和馬哈朗諾比斯距離為最小之叢集。表示正確分類種類和判定結果之數字為一致之特徵量資料。 As a result of the calculation, Fig. 15 is a conventional calculation method, and using the feature quantities a and g as a combination of feature amounts, the Mahalanobis distance is calculated for each learning material shown in Fig. 14 of cluster 1 to cluster 3. , indicating the result of the judgment. In Figure 15 (a), Cluster1 is listed as the Mahalanobis distance from Cluster 1, Cluster 2 is listed as the Mahalanobis distance from Cluster 2, and Cluster3 is listed as the Horse with Cluster 3. Harlanobis distance. Furthermore, the category column actually represents the cluster to which each learning material belongs, and the judgment result indicates that the learning material and the Mahalanobis distance are the smallest clusters. table The number indicating the correct classification type and the judgment result is the same characteristic quantity data.

在第15圖(b)中，列之號碼表示學習資料實際所屬之叢集，行之號碼表示被判定之叢集。例如，標記R1之「8」是將叢集1之10個叢集的內8個判定為叢集1，標記R2之「2」是將叢集1之10個叢集的內2個判定為叢集3。p0是表示表示正解和回答的一致率，p1是表示兩者偶然為一致之確率，k為全體補正判定率，藉由以下之式求出。表示該k越高分類精度越高。 In Fig. 15(b), the number of the column indicates the cluster to which the learning material actually belongs, and the number of the row indicates the cluster to be determined. For example, "8" of the flag R1 is determined to be the cluster 1 of the 10 clusters of the cluster 1 and the "2" of the marker R2 is the cluster 2 of the 10 clusters of the cluster 1. P0 is a coincidence rate indicating a positive solution and an answer, p1 is a rate indicating the coincidence of the two, and k is a total correction rate, which is obtained by the following equation. Indicates that the higher the k, the higher the classification accuracy.

k=(p0-p1)/(1-p1) k =(p0-p1)/(1-p1)

p0=(a+d)/(a+b+c+d) P0=(a+d)/(a+b+c+d)

p1=[(a+b)‧(a+c)‧(b+d)‧(c+d)]‧(a+b+c+d)² P1=[(a+b)‧(a+c)‧(b+d)‧(c+d)]‧(a+b+c+d) ²

使用第16圖說明上述式中之a、b、c、d之關係。 The relationship of a, b, c, and d in the above formula will be described using Fig. 16.

屬於叢集1之資料被當作叢集1分類之數量為a，屬於叢集1之資料被當作叢集2分類之數量為b，a+b是表示屬於叢集1之資料數量。再者，同樣屬於叢集2之資料被當作叢集2分類之數量為d，同樣屬於叢集2之資料被當作叢集1分類之數量為c，表示c+d屬於叢集2之資料數。a+c為被分類成全資料a+b+c+d內之叢集1的數量，b+d在全資料a+b+c+d被分類成叢集b之數量。 The data belonging to cluster 1 is classified as a cluster 1 and the data belonging to cluster 1 is treated as cluster 2 as the number b, and a+b is the number of data belonging to cluster 1. Furthermore, the data belonging to cluster 2 is treated as cluster 2, the number is d, and the data belonging to cluster 2 is treated as cluster 1 and the number is c, indicating that c+d belongs to cluster 2. a+c is the number of clusters 1 classified into the full data a+b+c+d, and b+d is classified into the number of clusters b in the full data a+b+c+d.

接著，第17圖使用第1實施形態之計算手法，對叢集1至叢集3之第14圖所示之各學習資料，運算馬哈朗諾比斯距離，表示判定結果。針對該第17圖(a)及(b)之看法，因與第15圖相同，故省略其說明。可知正確率p0、偶然一致之確率p1、全體補正判定率k是與第 15圖之以往計算手法相等。在此，使用自上述全體之組合中，選擇每個叢集具有最大判別基準值λ之組合的方法，算出對應於各叢集之特徵量集合。當作對應於叢集1之特徵量集合，是使用特徵量a及h之組合，當作對應於叢集2之特徵量集合是使用特徵量a及d，當作對應於叢集3之特徵量集合是使用特徵量a、g之組合。 Next, in the seventh embodiment, the Mahalanobis distance is calculated for each learning material shown in Fig. 14 of the cluster 1 to the cluster 3, using the calculation method of the first embodiment, and the determination result is shown. The views of FIGS. 17(a) and (b) are the same as those of Fig. 15, and therefore their description will be omitted. It can be seen that the accuracy rate p0, the accidental coincidence rate p1, and the total correction rate k are equal to the conventional calculation method of Fig. 15. Here, a method of selecting a combination of the maximum discrimination reference values λ for each cluster from the above-described combination is used, and a feature amount set corresponding to each cluster is calculated. The feature quantity set corresponding to the cluster 1 is a combination of the feature quantities a and h, and the feature quantity set corresponding to the cluster 2 is the feature quantity a and d, and the feature quantity set corresponding to the cluster 3 is A combination of feature quantities a, g is used.

接著，第18圖使用第2實施形態之計算手法，對叢集1至叢集3之第14圖所示之各學習資料，運算馬哈朗諾比斯距離，表示判定結果。針對觀看該第18圖(a)及(b)之看法，因與第15圖相同，故省略其說明。正確率p0為0.8333，偶然一致的確率p1為0.3333，全體補正判定率k為0.75，可知比起第15圖之以往計算之手法，提升分類精度。在此，使用自上述全體之組合中，對選擇在每個叢集選擇具有至上位第3位為止之判別基準值λ之組合的方法，算出對應於各叢集之特徵量集合。使用特徵量a．h、a．g、d．e之3個組合，以當作對應於叢集1之特徵量集合，使用特徵量a．f、a．d、a．b之3個組合以當作對應於叢集2之特徵量集合，使用特徵量e．g、a．c、a．g之3個組合當作對應於叢集3之特徵量。 Next, in the 18th figure, using the calculation method of the second embodiment, the Mahalanobis distance is calculated for each learning material shown in Fig. 14 of the cluster 1 to the cluster 3, and the determination result is shown. The views of FIGS. 18(a) and (b) are the same as those of Fig. 15, and the description thereof will be omitted. The correct rate p0 is 0.8333, the coincidence accuracy rate p1 is 0.3333, and the total correction rate k is 0.75. It can be seen that the classification accuracy is improved compared with the conventional calculation method of Fig. 15. Here, a method of selecting a combination of the discrimination reference values λ up to the third digit in each cluster is selected from the combination of the above, and the feature amount set corresponding to each cluster is calculated. Use feature quantity a. h, a. g, d. 3 combinations of e to be used as a set of feature quantities corresponding to cluster 1, using feature quantity a. f, a. d, a. The three combinations of b are taken as the feature quantity set corresponding to cluster 2, and the feature quantity e is used. g, a. c, a. The three combinations of g are taken as the feature quantities corresponding to the cluster 3.

再者，作為投票之判定，是由馬哈朗諾比斯距離少的順序排列，計算由少的進入至第3位的叢集之數量，將最多數量之叢集當作其分類對象資料所屬之叢集。 Furthermore, as the judgment of the vote, the number of clusters from the few to the third digits is calculated in the order of the distance from the Mahalanobis, and the cluster of the largest number is regarded as the cluster to which the classification object data belongs. .

接著，第19圖使用第2實施形態之計算手法，對叢集1至叢集3之第14圖所示之各學習資料，運算馬哈朗諾比斯距離，並且對計算結果之馬哈朗諾比斯距離乘算補正係數(λ)^-1/2之後，執行距離之順位排列，表示判定結果。針對該觀看該第19圖(a)及(b)之方法，因與第15圖相同，故省略其該說明。正確率p0為0.8333，偶然一致的確率p1為0.3333，全體補正判定率k為0.75，可知比起第15圖之以往計算之手法，提升分類精度。在此，使用自上述全體之組合中，對選擇在每個叢集選擇具有至上位第3位為止號之判別基準值λ之組合的方法，算出對應於各叢集之特徵量集合。使用特徵量a‧h、a‧g、d‧e之3個組合，以當作對應於叢集1之特徵量集合，使用特徵量a.f、a‧d、a‧b之3個組合以當作對應於叢集2之特徵量集合，使用特徵量e‧g、a‧c、a‧g之3個組合當作對應於叢集3之特徵量。 Next, in Fig. 19, using the calculation method of the second embodiment, the Mahalanobis distance is calculated for each learning material shown in Fig. 14 of the cluster 1 to the cluster 3, and the Mahalano ratio is calculated. After multiplying the correction coefficient (λ) ^-1/2 , the distance of the execution distance is arranged to indicate the determination result. The method for viewing the above-described 19th (a) and (b) is the same as that of Fig. 15, and therefore the description thereof will be omitted. The correct rate p0 is 0.8333, the coincidence accuracy rate p1 is 0.3333, and the total correction rate k is 0.75. It can be seen that the classification accuracy is improved compared with the conventional calculation method of Fig. 15. Here, a combination of the above-described total combinations is used to select a combination of the discrimination reference values λ having the number of the upper third digits in each cluster, and the feature amount sets corresponding to the clusters are calculated. Three combinations of feature quantities a‧h, a‧g, and d‧e are used as the feature quantity set corresponding to cluster 1, and three combinations of feature quantities af, a‧d, and a‧b are used as Corresponding to the feature quantity set of the cluster 2, three combinations of the feature quantities e‧g, a‧c, and a‧g are used as the feature quantities corresponding to the cluster 3.

由上述第15、17、18、19圖所示之各分類結果，判斷本實施形態比起以往例，執行高速且高精度之叢集化處理，確認出本實施形態相對於以往例之優越性。 From the respective classification results shown in the above-mentioned 15th, 17th, 18th, and 19th, it is judged that the present embodiment performs high-speed and high-precision clustering processing as compared with the conventional example, and the superiority of the present embodiment with respect to the conventional example is confirmed.

[Application example of the present invention] A. Inspection device

說明如第20圖所示分類被檢查物，例如玻璃基板表面之刮痕種類之檢查裝置(缺陷檢查裝置)。第21圖為說明特徵量集合之選擇之動作例的流程圖，第22圖為說明叢集化處理中之動作例的流程圖。 An inspection apparatus (defect inspection apparatus) for classifying an inspection object such as a scratch type on the surface of a glass substrate as shown in Fig. 20 will be described. Figure 21 is A flowchart illustrating an operation example of selection of a feature amount set, and FIG. 22 is a flowchart illustrating an operation example in the clustering process.

首先，針對特徵量集合之選擇動作予以說明。第5圖之流程圖中之步驟S1中之學習資料之收集，是對應於第21圖之流程圖之步驟S101至步驟S105。 First, the selection action of the feature amount set will be described. The collection of the learning materials in step S1 in the flowchart of Fig. 5 corresponds to steps S101 to S105 of the flowchart of Fig. 21.

第21圖之步驟S2至步驟S4因與第5圖之流程圖相同，故省略說明。 Steps S2 to S4 of Fig. 21 are the same as those of the flowchart of Fig. 5, and therefore description thereof will be omitted.

藉由操作員之操作，收集分別對應於欲分類刮痕種類之叢集之學習資料用之樣本(步驟S101)。 By the operation of the operator, samples for learning materials respectively corresponding to the clusters of the types of scratches to be classified are collected (step S101).

以照明裝置102照射畫像取得部101當作學習資料所收集之刮痕之形狀，藉由攝影裝置103取得刮痕部份之畫像資料(步驟S102)。 The illumination device 102 irradiates the image acquisition unit 101 with the shape of the scratches collected by the learning material, and the image capturing device 103 acquires the image data of the scratched portion (step S102).

然後，自畫像取得部101所取得之畫像資料算出各學習資料之刮痕之特徵量(步驟S103)。 Then, the image data acquired by the self-image acquisition unit 101 calculates the feature amount of the scratch of each learning material (step S103).

將所取得之學習資料之特徵量各分配至以目視所取得之分類目標，執行各叢集中之學習資料之特定(步驟S104)。 Each of the acquired feature amounts of the learning materials is assigned to the classified target obtained by visual observation, and the specificity of the learning materials in each cluster is executed (step S104).

然後，各叢集之學習資料成為特定數(事先設定之樣本數)，例如成為各30個左右，重複自步驟S101至步驟S102為止之處理，當成為特定數時，叢集部105執行所有第5圖中所說明之步驟S2以後之處理。在此，叢集部105為第1或是第2實施形態中之叢集系統。 Then, the learning data of each cluster is a specific number (the number of samples set in advance), for example, about 30 each, and the processing from step S101 to step S102 is repeated. When the number is a specific number, the clustering unit 105 executes all the fifth graphs. The processing after step S2 described in the above. Here, the clustering unit 105 is the clustering system in the first or second embodiment.

接著，參照第22圖，說明第4圖之檢查裝置中之叢集化之處理。在此，因第22圖之步驟S31至步驟S34、 S55及S56與第10圖之流程圖相同，故省略說明。 Next, the processing of the clustering in the inspection apparatus of Fig. 4 will be described with reference to Fig. 22. Here, step S31 to step S34 of FIG. 22, S55 and S56 are the same as the flowchart of Fig. 10, and therefore the description thereof will be omitted.

在第20圖之檢查裝置中，當開始檢查時，對為被檢查物100之玻璃基板，照明裝置102執行照明，攝影裝置103攝影玻璃基板表面而將其攝影畫像輸出至畫像取得部101。依此，缺陷候補檢測部104在自畫像取得部所輸入之攝影畫像中，檢測出與平面形狀不同之部份，設為應分類此之缺陷候補(步驟S201)。 In the inspection apparatus of Fig. 20, when the inspection is started, the illumination device 102 is illuminated for the glass substrate of the inspection object 100, and the imaging device 103 photographs the surface of the glass substrate and outputs the captured image to the image acquisition unit 101. In this way, the defect candidate detecting unit 104 detects a portion different from the plane shape in the photographed image input by the self-image obtaining unit, and sets the defect candidate to be classified (step S201).

接著，缺陷候補檢測部104是自攝影畫像切除其缺陷候補部份之畫像資料以當作分類對象資料。 Next, the defect candidate detecting unit 104 cuts out the image data of the defect candidate portion from the photographed image as the classification target data.

然後，缺陷候補檢測部104是自分類對象資料之畫象資料算出特徵量，對叢集部105輸出由所抽出之特徵量之集合所構成之分類對象資料(步驟S202)。 Then, the defect candidate detecting unit 104 calculates the feature amount from the image data of the classification target data, and outputs the classification target data composed of the extracted feature amount to the cluster unit 105 (step S202).

針對之後的叢集化之處理，因在第10圖之步驟中已說明，故省略。如上述般，本發明之檢查裝置可以以高精度將玻璃基板上帶有之刮痕分類成刮痕之每個種類。 The processing for the subsequent clustering is omitted because it is explained in the step of Fig. 10. As described above, the inspection apparatus of the present invention can classify the scratches on the glass substrate into each of the types of scratches with high precision.

B. Defect type determination device

第23圖所示之缺陷種類判定裝置，叢集部105對應於既已說明之本發明之叢集系統。 The defect type determining device shown in Fig. 23, the clustering unit 105 corresponds to the cluster system of the present invention which has been described.

畫像取得裝置201是由第20圖中之畫像取得部101、照明裝置102及攝影裝置103所構成。 The image acquisition device 201 is composed of the image acquisition unit 101, the illumination device 102, and the imaging device 103 in Fig. 20 .

已取得將分類對象資料予以分類之目標的各叢集之學習資料，並準備於叢集裝置105之叢集資料庫5。因此，也結束第5圖中之特徵量集合。 The learning materials of the clusters in which the classification object data is classified are obtained, and are prepared in the cluster database 5 of the cluster device 105. Therefore, the feature amount set in Fig. 5 is also ended.

自被安裝在各製造裝置上之畫像取得裝置202所輸入之攝影裝置檢測出缺陷候補，切取其畫像資料，抽出特徵量而輸出至資料收集裝置203。控制裝置200使被輸入至資料收集裝置203之分類對象資料轉送至叢集部105。然後，如已說明般，叢集部105是對對應於刮痕之種類的各叢集分類所輸入之分類對象資料。 The imaging device input from the image acquisition device 202 mounted on each manufacturing device detects the defect candidate, extracts the image data, extracts the feature amount, and outputs it to the data collection device 203. The control device 200 transfers the classification target data input to the material collection device 203 to the clustering unit 105. Then, as described above, the clustering unit 105 is classification object data input for each cluster classification corresponding to the type of the scratch.

C. Manufacturing management device

本發明之製造管理裝置是如第24圖所示般，控制裝置300、製造裝置301、302、告知部303、記錄部304、狀況不佳裝置判定部305及缺陷類別判定裝置306所構成。在此，缺陷類別判定裝置306是與上述B項所說明之缺陷類別判定裝置相同。 The manufacturing management device of the present invention is constituted by the control device 300, the manufacturing devices 301 and 302, the notifying unit 303, the recording unit 304, the inferior device determining unit 305, and the defect type determining device 306, as shown in Fig. 24. Here, the defect type determining means 306 is the same as the defect type determining means described in the above item B.

缺陷種類判定裝置306是在所對應之缺陷候補檢測部104中，將來自各設置在製造裝置301及製造裝置302上之畫象取得裝置201、202的攝影畫像予以畫像處理，而抽出特徵量，執行分類對象資料之分類。 The defect type determining unit 306 performs image processing on the image capturing images of the image capturing devices 201 and 202 provided in each of the manufacturing device 301 and the manufacturing device 302 in the corresponding defect candidate detecting unit 104, and extracts the feature amount. Perform classification of classified object data.

接著，狀況不佳裝置判定部305具有表示被分類之叢集之識別資訊，和對應於其叢集之發生要因之關係的表，自上述表讀出自上述缺陷種類判定裝置306輸入之分類目標之叢集之識別資訊的發生要因，判定成為發生要因之製造裝置。即是，狀況不佳裝置判定部305是對應於叢集之識別資訊，檢測出製品之製程中之缺陷的產生要因。 Next, the poor device determining unit 305 has a table indicating the identification information of the cluster to be classified and a relationship corresponding to the cause of the cluster, and reads the cluster of the classification targets input from the defect type determining device 306 from the table. Identify the cause of the occurrence of the information and determine the manufacturing device that is the cause of the occurrence. In other words, the poor condition device determining unit 305 detects the occurrence of a defect in the manufacturing process of the product in accordance with the identification information of the cluster.

然後，狀況不佳裝置判定部305由告知部303通知至操作員，並且於記錄部304對應於所判定之日時，記憶被缺陷分類之叢集之識別號碼、發生原因、和其製造裝置之識別資訊以當作履歷。再者，控制裝置300是停止狀況不佳裝置判定部305判定之製造裝置，控制控制參數。 Then, the poor condition device determining unit 305 is notified by the notifying unit 303 to The operator, when the recording unit 304 corresponds to the date determined, memorizes the identification number of the cluster of the defect classification, the cause of the occurrence, and the identification information of the manufacturing apparatus as the history. Furthermore, the control device 300 is a manufacturing device determined by the stop condition poor device determining unit 305, and controls the control parameters.

D. Manufacturing management device

本發明之其他製造管理裝置是如第25圖所示般，由控制裝置300、製造裝置301、302、告知部303、記錄部304及叢集部105所構成。在此，叢集部105是與上述A、B項中所說明之構成相同。 The other manufacturing management apparatus of the present invention is constituted by the control device 300, the manufacturing apparatuses 301 and 302, the notifying unit 303, the recording unit 304, and the clustering unit 105 as shown in Fig. 25. Here, the clustering unit 105 is the same as the configuration described in the above items A and B.

叢集部105是與上述A至C之情形不同，藉由分類對象資料之特徵資料是藉由例如玻璃基板之製造過程中之製造條件(材料之分量、處理溫度、壓力、處理速度等)所構成之特徵量，按製程之各工程的製造狀態予以分類。上述特徵量是以被設置在各製造裝置301或302之感測器所檢測出之工程資訊被輸入至叢集部115以當作特徵量。 The clustering unit 105 is different from the above-described cases of A to C, and the feature data of the classified object data is constituted by, for example, manufacturing conditions (material component, processing temperature, pressure, processing speed, etc.) in the manufacturing process of the glass substrate. The feature quantity is classified according to the manufacturing status of each process of the process. The above-described feature amount is input to the cluster portion 115 as the feature amount by the engineering information detected by the sensors provided in the respective manufacturing apparatuses 301 or 302.

即是，叢集部105是藉由上述分類對象資料之特徵量，將各製造裝置之各工程中的玻璃製造過程之製造狀態，分類成「正常狀態」、「容易發生缺陷需要調整之狀態」、「危險而需要調整之狀態」等之叢集。然後，叢集部105藉由告知部303將上述分類結果通知至操作員，並且將分類結果之叢集的識別資訊輸出至控制裝置300，再者，使記錄部304對應於所判定之日時，記憶當作履歷之上述各工程之製造狀態的分類識別號碼；最成為問題之特徵量的製造條件；和其製造裝置之識別資訊。 In other words, the clustering unit 105 classifies the manufacturing state of the glass manufacturing process in each of the manufacturing apparatuses by the feature amount of the classification target data into a "normal state" and a "state in which defects are likely to be adjusted". A cluster of "dangerous and need to adjust the state". Then, the clustering unit 105 notifies the operator of the classification result by the notifying unit 303, and outputs the identification information of the clustering result of the classification result to the control device 300, and further, when the recording unit 304 corresponds to the date determined, The classification identification number of the manufacturing status of each of the above-mentioned projects as a history; The manufacturing conditions of the levy; and the identification information of the manufacturing device.

控制裝置300具有表示使叢集之識別資訊和製造條件返回正常之調整項目及其資料對應的表，讀出對應於自叢集部105所輸入之叢集識別資訊，使製造條件返回正常之調整項目及該資料，藉由所讀出之資料控制對應的製造裝置。 The control device 300 has a table corresponding to the adjustment item for which the identification information of the cluster and the manufacturing condition are returned to normal, and the data thereof, and reads the adjustment information corresponding to the cluster identification information input from the cluster unit 105, and returns the manufacturing condition to the normal adjustment item. The data is controlled by the read data to control the corresponding manufacturing device.

並且，即使將用以實現第1圖中之叢集系統之功能的程式記錄於電腦可讀取之記錄媒體，使電腦系統讀入記錄於該記錄媒體之程式，藉由實行執行分類對象資料之叢集化之處理亦可。並且，在此所指「電腦系統」包含OS或周邊機器等之硬體。再者，「電腦系統」也包含具備有網頁提供環境(或是顯示環境)之WWW系統。再者，「電腦可讀取之記錄媒體」是指軟碟、光磁碟、ROM、CD-ROM等之可攜帶性媒體、內藏於電腦系統之硬碟等之記憶裝置。並且，「電腦可讀取之記錄媒體」為含有如經網際網路等之網絡或電話回線等之通訊回線發送程式之時成為伺服器或客戶之電腦系統內部之揮發性記憶體(RAM)般，以一定時間保持程式者。 Further, even if the program for realizing the function of the cluster system in FIG. 1 is recorded on a computer-readable recording medium, the computer system reads the program recorded on the recording medium, and performs clustering of the classified object data. Processing can also be. Further, the term "computer system" as used herein includes hardware such as an OS or a peripheral device. Furthermore, the "computer system" also includes a WWW system with a webpage providing environment (or a display environment). Furthermore, "computer-readable recording medium" refers to a portable device such as a floppy disk, a magneto-optical disk, a ROM, a CD-ROM, or the like, and a hard disk embedded in a computer system. Further, the "computer-readable recording medium" is a volatile memory (RAM) inside the computer system of the server or the client when the program is transmitted via a communication loop such as a network such as the Internet or a telephone line. , to keep the programmer for a certain period of time.

再者，上述程式即使自在記憶裝置等儲存該程式之電腦系統，經傳送媒體或是藉由傳送媒體中之傳送波傳送至其他電腦系統亦可。在此，傳送程式之「傳送媒體」是如網際網路等之網絡(通訊網)或電話回線等之通訊回線(通訊線)般，具有傳送資訊之功能的媒體。再者，上述程式即使為用以實現上述功能之一部份者亦可。並且，即使為可以藉由與既已記錄於電腦系統之程式的組合來實現上述功能者，或是為差分檔案(差分程式)亦可。 Furthermore, the program may be transmitted to another computer system via a transmission medium or a transmission wave in the transmission medium, even if it is stored in a computer system such as a memory device. Here, the "transmission medium" of the transmission program is a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. Furthermore, the above program may be used to implement one of the above functions. And that is It is also possible to implement the above functions by a combination with a program already recorded in a computer system, or a differential file (differential program).

[Industry use feasibility]

本發明是可以運用於如玻璃物品等之缺點檢測等般以高精度分類判別具有多種類特徵量之資訊的領域，並且亦可以利用於製造狀態檢測裝置或製品製造管理裝置。 The present invention is applicable to a field in which information having a plurality of types of feature amounts is classified and classified with high precision, such as detection of defects such as glass articles, and the like, and can also be utilized in a manufacturing state detecting device or a product manufacturing management device.

並且，在此引用2006年7月6日申請的日本專利申請號2006-186628之說明書、申請專利範圍、圖式及發明摘要的全部內容，作用本發明之說明書的揭示內容而被採用。 The entire contents of the specification, the claims, the drawings, and the abstract of the application of the Japanese Patent Application No. 2006-186628, filed on Jul. 6, 2006, are hereby incorporated by reference.

1‧‧‧特徵量集合作成部 1‧‧‧Characteristics

2‧‧‧特徵量抽出部 2‧‧‧Characteristic Extraction Department

3‧‧‧距離計算部 3‧‧‧ Distance Calculation Department

4‧‧‧特徵量集合記憶部 4‧‧‧Characteristics collection memory

5‧‧‧叢集資料庫 5‧‧‧ Cluster Database

100‧‧‧被檢查物 100‧‧‧Inspected objects

101‧‧‧畫像取得部 101‧‧‧Portrait Acquisition Department

102‧‧‧照明裝置 102‧‧‧Lighting device

103‧‧‧攝像裝置 103‧‧‧ camera

104‧‧‧缺陷候補檢測部 104‧‧‧Defective candidate detection department

105‧‧‧叢集部 105‧‧ ‧ Cluster Department

200、300‧‧‧控制裝置 200, 300‧‧‧ control devices

201、202‧‧‧畫像取得裝置 201, 202‧‧‧Portrait acquisition device

301、302‧‧‧製造裝置 301, 302‧‧‧ manufacturing equipment

303‧‧‧告知部 303‧‧‧Notice

304‧‧‧記錄部 304‧‧‧Record Department

第1圖是表示本發明之第1及第2實施形態之叢集系統之構成例的方塊圖。 Fig. 1 is a block diagram showing a configuration example of a cluster system according to the first and second embodiments of the present invention.

第2圖是說明對於藉由判別基準值γ選擇特徵集合之處理的列表。 Fig. 2 is a diagram for explaining a process of selecting a feature set by discriminating a reference value γ.

第3圖是說明對於藉由判別基準值γ選擇特徵集合之處理的列表。 FIG. 3 is a list for explaining a process of selecting a feature set by discriminating a reference value γ.

第4圖是說明對於藉由判別基準值γ選擇特徵集合之效果的直方圖。 Fig. 4 is a histogram illustrating the effect of selecting a feature set by discriminating the reference value γ.

第5圖是表示藉由第1實施形態對各叢集選擇特徵量集合的處理之動作例的流程圖。 Fig. 5 is a flowchart showing an operation example of a process of selecting a feature amount set for each cluster by the first embodiment.

第6圖是表示藉由第1實施形態對分類對象資料執行叢集化之處理之動作例的流程圖。 Fig. 6 is a view showing execution of classification data by the first embodiment; A flowchart of an example of the operation of the clustering process.

第7圖是表示生成第2實施形態中叢集化之處理所使用之規則模式之表的動作例之流程圖。 Fig. 7 is a flowchart showing an operation example of a table for generating a rule pattern used in the process of clustering in the second embodiment.

第8圖是表示藉由第2實施形態對分類對象資料執行叢集化之處理之動作例的流程圖。 Fig. 8 is a flowchart showing an operation example of a process of performing clustering on classification target data by the second embodiment.

第9圖是表示藉由第2實施形態對分類對象資料執行其他叢集化之處理之動作例的流程圖。 Fig. 9 is a flowchart showing an operation example of a process of performing other clustering on the classification target data by the second embodiment.

第10圖是表示藉由第3實施形態對分類對象資料執行叢集化之處理之動作例的流程圖。 Fig. 10 is a flowchart showing an operation example of a process of performing clustering on classification target data by the third embodiment.

第11圖是表示設定當作特徵量之變換方法的運算式之動作例的流程圖。 Fig. 11 is a flowchart showing an example of an operation of setting an arithmetic expression as a method of converting a feature amount.

第12圖是表示第11圖之流程圖中算出評估值之動作例的流程圖。 Fig. 12 is a flow chart showing an operation example of calculating an evaluation value in the flowchart of Fig. 11.

第13圖是表示算出使用所設定之變換方法所變換出之特徵量的距離之動作例的流程圖。 Fig. 13 is a flowchart showing an example of an operation for calculating a distance of a feature amount converted using the set conversion method.

第14圖是表示屬於各叢集之學習資料的列表。 Figure 14 is a list showing the learning materials belonging to each cluster.

第15圖是表示將第14圖之學習資料藉由以往例之叢集方法予以分類之結果的結果列表。 Fig. 15 is a list of results showing the results of classifying the learning materials of Fig. 14 by the clustering method of the conventional example.

第16圖是說明全體補正判定率之算出方法的概念圖。 Fig. 16 is a conceptual diagram for explaining a method of calculating the total correction determination rate.

第17圖是表示藉由第1實施形態中之叢集系統分類第14圖之學習資料之結果的結果列表。 Fig. 17 is a view showing a result of the result of classifying the learning materials of Fig. 14 by the cluster system in the first embodiment.

第18圖是表示藉由第2實施形態中之叢集系統分類第14圖之學習資料的結果列表。 Fig. 18 is a view showing a result list of the learning materials classified in Fig. 14 by the cluster system in the second embodiment.

第19圖是表示藉由第2實施形態中之叢集系統分類第14圖之學習資料之結果列表。 Fig. 19 is a view showing a result list of the learning materials of Fig. 14 classified by the cluster system in the second embodiment.

第20圖是表示使用本發明之叢集系統之檢查裝置之構成例的方塊圖。 Fig. 20 is a block diagram showing an example of the configuration of an inspection apparatus using the cluster system of the present invention.

第21圖是表示第20圖之檢查裝置中之特徵量集合之選擇動作例之流程圖。 Fig. 21 is a flow chart showing an example of the selection operation of the feature amount set in the inspection apparatus of Fig. 20.

第22圖是表示第20圖之檢查裝置中之叢集化之處理之動作例的流程圖。 Fig. 22 is a flow chart showing an operation example of the process of clustering in the inspection apparatus of Fig. 20.

第23圖是表示使用本發明之叢集系統之缺陷種類判定裝置之構成立的方塊圖。 Fig. 23 is a block diagram showing the constitution of a defect type judging device using the cluster system of the present invention.

第24圖是表示使用本發明之叢集系統之製造管理裝置之構成例的方塊圖。 Fig. 24 is a block diagram showing an example of the configuration of a manufacturing management apparatus using the cluster system of the present invention.

第25圖是表示使用本發明之叢集系統之其他製造管理裝置之構成例的方塊圖。 Fig. 25 is a block diagram showing an example of the configuration of another manufacturing management apparatus using the cluster system of the present invention.

1‧‧‧特徵量集合作成 1‧‧‧Characteristics

2‧‧‧特徵量抽出部 2‧‧‧Characteristic Extraction Department

3‧‧‧距離計算部 3‧‧‧ Distance Calculation Department

4‧‧‧特徵量集合記憶部 4‧‧‧Characteristics collection memory

5‧‧‧叢集資料庫 5‧‧‧ Cluster Database

Claims

A cluster system, which is a cluster system that classifies input data into respective clusters formed by a parent group of learning materials by using the feature quantity of the input data, and has a feature quantity set memory unit. Corresponding to each of the clusters, a feature quantity set in which a combination of feature quantities used for classification is stored; a feature quantity extracting unit extracting a feature quantity set in advance from the input data; and a distance calculating unit corresponding to each Each feature quantity set of the cluster calculates the distance between the center of the parent group of each cluster and the input data according to the feature quantity contained in the feature quantity set, and outputs the distance as the set distance; and the order extraction unit, The above-mentioned respective set distances are arranged from small to large, and a plurality of the above-mentioned feature quantity sets are set for each cluster, and a cluster classification part is also represented by the above-mentioned set distances obtained according to each feature quantity set. The input data of the set distance of the set distance is classified into a rule pattern of the classification reference of each cluster, and the input data is detected. Which one cluster.

The clustering system according to the first aspect of the invention, wherein the cluster classification unit detects, by the rank of the set distance, which cluster the input data belongs to, and sets the rank as a cluster of a higher set distance. , detected as a cluster to which the above input data belongs Out.

The clustering system according to the second aspect of the invention, wherein the cluster classification unit has a threshold value that is higher than a rank, and if the upper cluster is greater than the threshold, the data is regarded as an input data. It is detected by clustering.

The clustering system according to any one of claims 1 to 3, wherein the distance calculating unit multiplies the set distance by the correction coefficient set corresponding to the feature amount set, and sets each feature amount set. The set distance between them is standardized.

The clustering system according to any one of claims 1 to 3, further comprising a feature quantity set cooperation unit for creating a feature quantity set of each cluster, wherein the feature quantity set cooperation part is for each feature quantity For each of the majority of the combinations, the average of the learning materials of the parent group of each cluster is taken as the origin, and the average of the distances between the origins and the learning materials of the parent group of the other clusters is obtained, and will become The combination of the feature amounts of the maximum average is selected as a set of feature quantities for distinguishing each cluster from other clusters.

A defect type determining apparatus, characterized in that: the clustering system according to any one of the above-mentioned claims, wherein the input data is an image data of a product defect, by indicating a feature quantity of the defect, The defects in the image data are classified according to the type of the defect.

Defect type determination device as described in item 6 of the patent application scope The above product is a glass article, and the defects of the glass article are classified according to the type of the defect.

A defect detecting device characterized by providing a defect type determining device as described in claim 6 or 7 for detecting a defect type of a product.

A manufacturing state judging device characterized in that the defect type judging device described in the sixth or seventh aspect of the patent application is provided, and the classification of the defect of the product is performed, and the correspondence is detected based on the correspondence with the cause corresponding to the category. The cause of defects in the process.

A manufacturing state judging device characterized by being provided with a cluster system as described in any one of claims 1 to 5, wherein the input data is a feature quantity indicating a manufacturing condition in a process of the product, according to the process The feature quantity of each project is classified.

The manufacturing state judging device according to claim 10, wherein the product is a glass article, and the feature amount in the process of the glass article is classified according to the manufacturing state of each process of the process.

A manufacturing state detecting device characterized in that the manufacturing state judging device as set forth in claim 10 or 11 is provided, and the type of manufacturing state in each of the processes of the product process is detected.

A product manufacturing management apparatus, comprising: a manufacturing state judging device as described in claim 10 or 11 of the patent application, executing a category of a manufacturing state in each of the processes for detecting a product process, and according to the category corresponding to the category Control the project and carry out process control in the process engineering.