[go: up one dir, main page]

TW200818060A - Clustering system, and defect kind judging device - Google Patents

Clustering system, and defect kind judging device Download PDF

Info

Publication number
TW200818060A
TW200818060A TW096124741A TW96124741A TW200818060A TW 200818060 A TW200818060 A TW 200818060A TW 096124741 A TW096124741 A TW 096124741A TW 96124741 A TW96124741 A TW 96124741A TW 200818060 A TW200818060 A TW 200818060A
Authority
TW
Taiwan
Prior art keywords
cluster
feature
distance
feature quantity
clusters
Prior art date
Application number
TW096124741A
Other languages
Chinese (zh)
Other versions
TWI434229B (en
Inventor
Makoto Kurumisawa
Akio Suguro
Koji Ohnishi
Original Assignee
Asahi Glass Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Asahi Glass Co Ltd filed Critical Asahi Glass Co Ltd
Publication of TW200818060A publication Critical patent/TW200818060A/en
Application granted granted Critical
Publication of TWI434229B publication Critical patent/TWI434229B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Library & Information Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is a clustering system capable of classifying target data more rapidly and precisely than the examples of the prior art. The clustering system classifies input data into each of clusters formed of the populations of learning data, in terms of featuring quantities owned by the input data. The clustering system comprises a featuring set storage unit stored with featuring quantity sets or a combination of featuring quantities to be used for the classifications, in a manner to correspond to the individual clusters, a featuring quantity extracting unit for extracting a preset featuring quantity from the input data, a distance calculating unit for calculating and outputting the distances between the center of the population of the individual clusters and the input data, individually as set distances for each featuring quantity set corresponding to each cluster, on the basis of the featuring quantities contained in the featuring quantity set, and a rank extracting unit for arraying the individual set distances in insequence.

Description

200818060 九、發明說明 【發明所屬之技術領域】 本發明是關於剪出檢測對象物之畫像中之缺陷部份的 部份畫像,自該部份畫像抽出缺陷之特徵訊號,而按缺陷 之種類予以分類之叢集系統、缺陷種類判定裝置。 【先前技術】 自以往一般執行例如藉由馬哈朗諾比斯距離( Mahalanobis’ generalized distance)所產生之叢集手法。 即是,藉由未知資料是否屬於當作事前學習之母集團的叢 集而分類,執行叢集處理。例如,藉由相對於多數叢集之 馬哈朗諾比斯距離的大小,判定未知資料屬於哪一個母集 團之叢集(例如,參照專利文獻1 )。 再者,爲了有效率計算上述距離,選擇多數特徵量執 行叢集之處理。 再者,依據藉由多數識別器(Classeifier )所取得結 果之投票,判定該未知資料所歸屬之叢集的手法也是一般 可見,使用不同感測器之輸出的識別結果,或是對一個畫 像上之不同領域識別未知資料之識別結果等(例如’參照 專利文獻2 )。 藉由上述叢集手法,在依據血液檢查結果所取得之參 數執行病情診斷,即是屬於哪一種病情的叢集中’有在多 數叢集中設定每兩個叢集爲一組合,對所有組合進行判定 該每個組合被檢資料是否類似於任一叢集’藉由該判定之 -5- 200818060 數量的合計,決定分類成所判定出之數量多之叢集的手法 (例如,參照專利文獻3 )。 於將安裝於LCD玻璃基板之各缺陷分類成事先所設 定之每個缺陷種類時,對應於分類時之識別,執行分類所 使用之各特徵量的最佳化,以對應於該最佳化之方式,對 各特徵量各執行加權,使用被該最佳化之特徵量,判定屬 於哪一個叢集的叢集分析(例如專利文獻4 )。 專利文獻1 :日本特開2 0 0 5 - 2 1 4 6 8 2號公報 專利文獻2:日本特開2001-56861號公報 專利文獻3 :日本特開〇 7 - 1 0 5 1 6 6號公報 專利文獻4:日本特開2002-99916號公報 【發明內容】 [發明所欲解決之課題] 但是,專利文獻3所示之叢集分析,是不執行各個組 合之最佳化,無有效利用成爲判別材料之特徵量,並且當 應判別之叢集變多時,則有組合數量巨大,判定處理所花 之時間增大的問題。 再者,專利文獻4所示之叢集分析,雖然是以判定率 爲根基,使特徵量提升具有加權之判別精度,但是並無每 個叢集之特徵量最佳化之槪念,與上述專利文獻3相同, 無法有效活用特徵量’故有無法執行高精度之分類。 本發明鑑於如此之事情所創作出者,提供可以於判別 自分類成屬於叢集之對象的分類對象資料所抽出之特徵量 -6 - 200818060 時活用,比起以往例,更高速更高精度將分類對象資料予 以分類,例如將位於玻璃面之缺陷分類成對應於缺陷種類 之叢集的叢集系統、缺陷種類判定裝置。 [用以解決課題之手段] 爲了解決上述課題,本發明是與以相同種類之特徵量 算出分類對象資料和各叢集之間的距離,而決定分類地點 之以往例有所不同,因在每個叢集設定可以在各叢集間取 得差分之特徵量之集合,在各個叢集之間以不同特徵量求 出距離,故比起以往執行精度更高之分類。 上述特徵量之集合因根據屬於各叢集之學習資料之特 性而執行,故由能夠與其他叢集區別之特徵量所構成。 即是,本發明採用以下之構成。 本發明之叢集系統屬於將輸入資料(input data )藉 由該輸入資料所具有之特徵量(parameter),分類成藉由 學習資料(leaening data )之母集團(population )所形成 之各個叢集的叢集系統,其特徵爲:具有:特徵量集合記 憶部’對應於上述各個叢集,記憶有分類所使用之特徵量 之組合的特徵量集合(parameter set);特徵量抽出部, 自輸入資料抽出事先所設定之特徵量;距離計算部,在對 應於各叢集之每個特徵量集合,根據該特徵量集合所含之 特徵量’計算出各叢集之母集團之中心和上述輸入資料之 距離’以當作各個集合距離而予以輸出;和順位抽出部, 由小至大之順位配列上述各集合距離。 200818060 本發明之最佳叢集系統,是在每個叢集設定有多個。 本發明之最佳叢集系統,是又具有叢集分類部,藉由 表示根據在每個特徵量集合取得之上述集合距離中的該集 合距離的順位設定之輸入資料分類到各個叢集之分類基準 的規則模式,檢測出上述輸入資料屬於哪一個叢集。 本發明之最佳叢集系統,是上述叢集分類部藉由上述 集合距離之順位,檢測出上述輸入資料屬於哪一個叢集, 並將該順位成爲上位之集合距離爲多之叢集,當作上述輸 入資料所屬之叢集而予以檢測出。 本發明之最佳叢集系統,是上述叢集分類部具有相對 於順位成爲上位之數量的臨界値,成爲上位之叢集若爲該 臨界値以上時,則當作輸入資料所屬之叢集而予以檢測出 〇 本發明之最佳叢集系統,是上述距離計算部相對於上 述集合距離,乘算對應於特徵量而所設定之補正係數,使 各特徵量集合間之集合距離予以標準化。 本發明之最佳叢集系統,又具有作成每個叢集之特徵 量集合之特徵量集合作成部,上述特徵量集合作成部是針 對各特徵量之多數組合中之每個組合,將各叢集之母集團 之學習資料之平均値當作原點,求出該原點和其他叢集之 母集團之各學習資料之距離之平均値,並將成爲最大平均 値之特徵量之組合,選擇爲用於將各叢集從其他叢集辨別 出之特徵量集合。 本發明之缺陷種類判定裝置,其特徵爲:設置如上述 -8- 200818060 所記載之叢集系統中之任一者,上述輸入資料爲製品缺陷 之畫像資料,藉由表示缺陷之特徵量,將畫像資料中之缺 陷按照缺陷之種類進行分類。 本發明之最佳缺陷種類判定裝置,是上述製品爲玻璃 物品,將該玻璃物品之缺陷按照缺陷之種類進行分類。 本發明之缺陷檢測裝置,其特徵爲:設置上述所記載 之缺陷種類判定裝置,用以檢測製品之缺陷的類別。 本發明之製造狀態判定裝置,其特徵爲:設置有如上 述所記載之缺陷種類判定裝置,執行製品之缺陷的分類, 根據與對應於該類別之發生要因的對應性,檢測出製程中 缺陷之發生要因。 本發明之最佳的製造狀態判定裝置,其特徵爲:設置 有上述所記載之叢集系統,上述輸入資料爲表示製品之製 程中之製造條件的特徵量,按照製程之各工程的製造狀態 對該特徵量進行分類。 本發明之最佳製造狀態判定裝置,是上述製品爲玻璃 物品,將該玻璃物品之製程中之特徵量,按照製程之各工 程的製造狀態進行分類。 本發明之製造狀態檢測裝置,其特徵爲:設置有上述 所記載之製造狀態判定裝置,檢測出製品製程之各工程中 的製造狀態之類別。 本發明之製品製造管理裝置,其特徵爲:設置有上述 所記載之製造狀態判定裝置,執行檢測出製品製程之各工 程中的製造狀態之類別,根據對應於該類別之控制項目, -9- 200818060 製程工程中之製程控制。 [發明效果] 如以上說明般,若藉由本發明,對分類地點之每個叢 集,自具有分類對象資料之多數特徵量,事先設定與其他 叢集之距離變遠的最佳特徵量之組合,各計算分類對象資 料和各叢集之間的距離,因將分類對象資料分類至該所計 算之距離爲最小之叢集,故比起以往之手法,可以更正確 將分類對象資料分類至對應的叢集。 再者,若藉由本發明在每個叢集多數設定上述組合, 以由小至大之順序排列全叢集和分類對象資料之計算結果 的距離’將分類對象資料分類至預先所設定之數量之上位 群集所含之數量最多的叢集,故比起以往可以執行精度高 之分類。 【實施方式】 本發明之叢集系統是關於藉由該輸入資料所具有之特 徵量,將分類對象之輸入資料分類成將學習資料當作母集 團而所形成之各個叢集的叢集系統,具有特徵量集合記憶 部,用於對應於上述各個叢集,記憶有分類所使用之特徵 量之組合的特徵量集合,根據事先設定有特徵量抽出部之 該特徵量集合,自上述輸入資料抽出特徵量,距離計算部 對對應於各叢集之每個特徵量,根據該特徵量集合所含之 特徵量,將母集團及與上述輸入資料之距離,各當作集合 -10- 200818060 距離予以計算’順位抽出部以從小至大之順序配列各集合 距離,對應於配列順序而執行叢集之分類。 [第1實施形態] 以下’參照圖面,說明本發明之第1實施形態所涉及 之叢集系統。第1圖是表示同實施形態所涉及之叢集系統 之構成例的方塊圖。 本貫施形悲之叢集系統是如第1圖所不般,具有特徵 量集合作成部1、特徵量抽出部2、距離計算部3、特徵量 集合記憶部4及叢集資料庫5。 在特徵量集合記憶部4,對應於各叢集之識別資訊, 記憶有個別被設定在每個叢集之表示分類對象資料之特徵 量之組合的特徵量集合。例如,分類對象資料爲特徵量之 集合{a、b、c、d}之時,被設定爲各叢集之特徵量集合u 、b]、[a、b、c、d]、[c]等之種類之特徵量的組合。以下 之說明中’由上述特徵量之集合,將特徵量所有組合、多 數(在上述例中,集合中之兩個、三個特徵量)之組合中 之任一個定義成「特徵量之組合」。 在此’叢集A、B及C被設定爲分類地點之叢集時, 對應於各叢集之特徵量集合,使用事先被分類於各叢集;^ 學習資料,以求出各叢集和其他叢集之距離變大之特徵襲 之組合,而記憶於特徵量集合記憶部4。 例如,對叢集A所設定之特徵量集合,是由屬於叢_ A之學習資料之各特徵量之平均値所構成之向量,和屬方令 -11 - 200818060 其他叢集B及C之學習資料之各特徵量之平均値所構成之 距離,被設定爲最大之特徵量之組合。 再者,分類對象資料和各叢集中之母集團之學習資料 是由相同之特徵量之集合所構成。 特徵量抽出部2自所輸入之分類對象資料,計算與各 叢集之距離時,自特徵量集合記憶部讀出對應於爲計算對 象之叢集的特徵量記憶部4,自分類對象資料之特徵量抽 出對應於該特徵量組合之特徵量,將抽出之特徵量輸出至 距離計算部3。 距離計算部3是自叢集資料庫5,是以計算對象之叢 集之識別資訊作爲金鑰,讀出由成爲計算對象之叢集的學 習資料之各特徵量的平均値所構成之向量,根據該叢集之 特徵量集合,算出由分類對象資料抽出之特徵量所構成之 向量,和由學習資料之各特徵量之平均値所構成之向量( 表示叢集中多數學習資料之重心位置的重心向量)之距離 〇 於執行上述距離之計算時,距離計算部3因去除特徵 量之資料單位的差異,使特徵量之數値標準化,故藉由下 述(1 )式,對分類對象資料之各特徵量v ( i )執行正規 化。 V(i) = (v(i)-avg.(i))/std.(i)...(l) 在此,v ( i )爲特徵量,avg· ( i )爲計算對象之叢集 內之學習資料中之特徵量之平均値,std. ( i )爲計算對象 之叢集內之學習資料中之特徵量之標準偏差( -12- 200818060 standardized deviation ) ,v(i)爲被正規化之特徵量。 因此,於計算距離時,距離計算部3必須對每個特徵量集 合,執行各特徵量之規格化。 再者,距離計算部3是計算分類對象資料中之距離所 使用之每個特徵量,使用學習資料之各個對應的特徵量之 平均値和標準偏差,而執行上述正規化處理。 再者,作爲距離即使採用使用上述標準化之特徵量之 標準化歐幾里德距離(standardized Euclidean distance) 、馬哈朗諾比斯距離(Mahalanobis^ distance)、明可夫 斯基距離(Minkowski ydistance)等中之任一者。 在此,於使用馬哈朗諾比斯距離時,馬哈朗諾比斯平 方距離(Mahalanobis’ squared distance) MHD 貝[]藉由以下 (2 )式求出。 MHD = (l/n) · (VTR~1V)...(2)200818060 IX. Description of the Invention [Technical Field of the Invention] The present invention relates to a partial image of a defective portion in a portrait of a detection object, from which a feature signal of a defect is extracted, and the defect is given by the type of the defect. Classification cluster system, defect type determination device. [Prior Art] Clustering techniques such as those generated by the Mahalanobis' generalized distance have been conventionally performed. That is, the cluster processing is performed by classifying whether the unknown data belongs to a cluster of the parent group as the prior learning. For example, by judging the size of the Mahalanobis distance with respect to the majority cluster, it is determined which cluster of the parent group the unknown material belongs to (for example, refer to Patent Document 1). Furthermore, in order to efficiently calculate the above distance, the processing of the cluster is performed by selecting a plurality of feature quantities. Furthermore, based on the votes obtained by the majority of the classifiers, the method of determining the cluster to which the unknown data belongs is also generally visible, using the recognition result of the output of the different sensors, or on a portrait. The recognition result of the unknown data is recognized in different fields (for example, 'refer to Patent Document 2). By the above-mentioned clustering method, the disease diagnosis is performed on the parameters obtained according to the results of the blood test, that is, the cluster in which the disease belongs. 'Every two clusters are set in a majority cluster as a combination, and all combinations are judged. Whether or not the combined test materials are similar to any cluster's by the total of the number of the determinations -5 - 200818060, the method of classifying into the determined number of clusters is determined (for example, refer to Patent Document 3). When each defect attached to the LCD glass substrate is classified into each defect type set in advance, the optimization of each feature amount used for classification is performed corresponding to the identification at the time of classification to correspond to the optimization. In the manner, weighting is performed for each feature amount, and the cluster analysis to which cluster belongs is determined using the optimized feature amount (for example, Patent Document 4). [Patent Document 1] Japanese Patent Laid-Open Publication No. 2001-56861 Patent Document 3: Japanese Patent Application Laid-Open No. Hei 7- 1 0 5 1 6 6 [Problem to be Solved by the Invention] However, the cluster analysis shown in Patent Document 3 does not perform optimization of each combination, and does not effectively use the discrimination. The feature quantity of the material, and when the cluster to be discriminated is increased, there is a problem that the number of combinations is large and the time taken for the determination process is increased. Furthermore, the cluster analysis shown in Patent Document 4 is based on the determination rate, and the feature quantity is improved with the weighted discrimination accuracy, but the feature quantity of each cluster is not optimized, and the above patent document 3 is the same, the feature quantity cannot be effectively utilized, so it is impossible to perform high-precision classification. In view of the fact that the author of the present invention provides a feature quantity -6 - 200818060 that can be extracted from the classification object data classified as an object belonging to the cluster, the present invention uses a higher speed and higher precision than the conventional example. The object data is classified, for example, a defect system located on a glass surface is classified into a cluster system corresponding to a cluster of defect types, and a defect type determining device. [Means for Solving the Problem] In order to solve the above problems, the present invention differs from the conventional example in which the classification target data and the clusters are calculated by the same type of feature quantity, and the classification point is determined. The cluster setting can obtain a set of the feature quantities of the differences between the clusters, and obtain the distances between the clusters with different feature quantities, so that the classification is performed with higher precision than in the past. Since the set of the above feature quantities is executed based on the characteristics of the learning materials belonging to each cluster, it is composed of feature quantities that can be distinguished from other clusters. That is, the present invention adopts the following constitution. The cluster system of the present invention belongs to a cluster in which input data is classified into a cluster formed by a population of learning data by a parameter of the input data. The system is characterized in that: the feature quantity set storage unit 'corresponds to each of the clusters, and stores a parameter set of a combination of feature quantities used for classification; the feature quantity extraction unit extracts the prior information from the input data The set feature quantity; the distance calculation unit calculates the distance between the center of the parent group of each cluster and the input data based on the feature quantity 'accepted by the feature quantity set for each feature quantity set of each cluster set And outputting each set distance; and the order extracting unit, and the respective set distances are arranged from small to large. 200818060 The optimal clustering system of the present invention has a plurality of settings in each cluster. The optimal clustering system of the present invention further includes a cluster classifying section that classifies the classification criteria of the clusters by inputting the input data of the set distances of the set distances obtained in each of the feature amount sets. The mode detects which cluster the input data belongs to. In the optimal clustering system of the present invention, the cluster classification unit detects, by the rank of the set distance, which cluster the input data belongs to, and sets the rank as a cluster of the upper set distance as the input data. It is detected by the cluster it belongs to. In the optimal clustering system of the present invention, the cluster classification unit has a threshold 数量 which is a higher number with respect to the rank, and if the upper cluster is above the threshold ,, it is detected as a cluster to which the input data belongs. In the optimum clustering system of the present invention, the distance calculating unit multiplies the correction coefficient set corresponding to the feature amount with respect to the collective distance, and normalizes the collective distance between the feature amount sets. The optimal clustering system of the present invention further has a feature quantity set cooperation unit for creating a feature quantity set of each cluster, and the feature quantity set cooperation part is a mother of each cluster for each combination of a plurality of combinations of feature amounts. The average of the Group's learning materials is taken as the origin, and the average distance between the origin and the learning materials of the parent group of the other clusters is obtained, and will be the combination of the maximum average 特征 characteristic amount, which is selected for A collection of feature quantities that each cluster recognizes from other clusters. The defect type determination device according to the present invention is characterized in that, in any one of the cluster systems described in the above-mentioned -8-200818060, the input data is an image data of a product defect, and the image is represented by a feature amount indicating the defect. Defects in the data are classified according to the type of defect. In the optimum defect type determining device of the present invention, the product is a glass article, and the defects of the glass article are classified according to the type of the defect. The defect detecting device of the present invention is characterized in that the defect type determining means described above is provided for detecting the type of defects of the product. The manufacturing state judging device according to the present invention is characterized in that the defect type judging device as described above is provided, and the defect classification of the product is executed, and the occurrence of the defect in the process is detected based on the correspondence with the cause of occurrence of the category. The cause. An optimum manufacturing state judging device according to the present invention is characterized in that the cluster system described above is provided, and the input data is a feature amount indicating a manufacturing condition in a process of the product, and is in accordance with a manufacturing state of each process of the process. The feature quantities are classified. In the optimum manufacturing state judging device of the present invention, the product is a glass article, and the feature amount in the process of the glass article is classified according to the manufacturing state of each process of the process. The manufacturing state detecting device according to the present invention is characterized in that the manufacturing state determining device described above is provided to detect the type of the manufacturing state in each of the processes of the product process. The product manufacturing management apparatus according to the present invention is characterized in that the manufacturing state determining means described above is provided, and the type of the manufacturing state in each of the processes for detecting the product process is executed, and the control item corresponding to the category is -9- 200818060 Process control in process engineering. [Effects of the Invention] As described above, according to the present invention, for each cluster of classification points, a combination of the best feature amounts that are farther away from the other clusters is set in advance from the majority of the feature amounts of the classification target data. Calculating the distance between the classification object data and each cluster, because the classification object data is classified to the cluster where the calculated distance is the smallest, the classification object data can be classified more correctly to the corresponding cluster than the conventional method. Furthermore, if the above combination is set in each cluster majority by the present invention, the distances of the calculation results of the full cluster and the classification object data are arranged in ascending order, and the classification object data is classified into a pre-set number of upper clusters. With the largest number of clusters included, it is possible to perform high-precision classifications compared to the past. [Embodiment] The cluster system of the present invention relates to a cluster system that classifies input data of a classification object into respective clusters formed by using the learning material as a parent group by the feature quantity of the input data, and has a feature quantity. The set memory unit is configured to collect a feature amount set corresponding to a combination of feature amounts used for classification in accordance with each of the clusters, and extract a feature amount and a distance from the input data based on the feature amount set in which the feature amount extracting unit is set in advance. The calculation unit calculates the distance between the parent group and the input data according to the feature quantity included in the feature quantity set, and calculates the distance from the input data to the set -10- 200818060. The set distances are arranged in ascending order, and the clustering is performed corresponding to the order of the columns. [First Embodiment] Hereinafter, a cluster system according to a first embodiment of the present invention will be described with reference to the drawings. Fig. 1 is a block diagram showing a configuration example of a cluster system according to the embodiment. The present embodiment is a feature set integration unit 1, a feature amount extracting unit 2, a distance calculating unit 3, a feature amount set storage unit 4, and a cluster database 5, as shown in Fig. 1. In the feature amount set storage unit 4, a feature amount set in which a combination of feature amounts of the classification target data is set in each cluster is stored in association with the identification information of each cluster. For example, when the classification target data is the set of feature quantities {a, b, c, d}, it is set as the feature quantity set u, b], [a, b, c, d], [c], etc. of each cluster. A combination of feature quantities of the types. In the following description, 'any combination of feature amounts, a plurality of combinations (in the above example, two of the sets, three feature amounts) is defined as "combination of feature quantities" from the set of feature amounts described above. . When clusters A, B, and C are set as clusters of classification locations, the feature quantity sets corresponding to the clusters are used to be classified into clusters in advance; ^ learning materials are used to find distances between clusters and other clusters. The combination of the characteristics of the big features is stored in the feature quantity set storage unit 4. For example, the set of feature quantities set for cluster A is a vector consisting of the average 値 of the feature quantities of the learning materials belonging to bundle _A, and the learning materials of genus -11 - 200818060 other clusters B and C. The distance formed by the average 値 of each feature amount is set as the combination of the largest feature amounts. Furthermore, the classification object data and the learning materials of the parent group in each cluster are composed of the same set of feature quantities. When the distance between the clusters is calculated from the input classification target data, the feature amount extracting unit 2 reads out the feature amount storage unit 4 corresponding to the cluster to be calculated from the feature amount set storage unit, and the feature amount of the self-classification target data The feature amount corresponding to the feature amount combination is extracted, and the extracted feature amount is output to the distance calculation unit 3. The distance calculation unit 3 is a self-cluster database 5, and reads, as a key, identification information of a cluster to be calculated, and reads a vector composed of an average value of each feature amount of the learning material of the cluster to be calculated, according to the cluster. The feature quantity set calculates the distance between the vector composed of the feature quantity extracted from the classification target data and the vector formed by the average value of each feature quantity of the learning material (the center of gravity vector indicating the position of the center of gravity of most of the learning materials in the cluster) When the distance calculation unit 3 performs the calculation of the distance, the distance calculation unit 3 normalizes the number of feature quantities by the difference in the data unit of the feature quantity. Therefore, the feature quantity v of the classification target data is expressed by the following formula (1). (i) Perform formalization. V(i) = (v(i)-avg.(i))/std.(i)...(l) Here, v ( i ) is the feature quantity, and avg· ( i ) is the cluster of the calculation object The average of the feature quantities in the learning materials, std. ( i ) is the standard deviation of the feature quantities in the learning data in the cluster of calculations ( -12- 200818060 standardized deviation ), v(i) is normalized The feature quantity. Therefore, when calculating the distance, the distance calculating unit 3 must integrate each feature amount and perform normalization of each feature amount. Further, the distance calculating unit 3 calculates each feature amount used for calculating the distance in the classification target data, and performs the above-described normalization processing using the average 値 and standard deviation of the respective feature amounts of the learning materials. Furthermore, as the distance, a standardized Euclidean distance, a Mahalanobis distance, a Minkowski ydistance, etc., using the above-described standardized feature quantities are employed. Any of them. Here, when the Mahalanobis distance is used, the Mahalanobis' squared distance MHD shell [] is obtained by the following formula (2). MHD = (l/n) · (VTR~1V)...(2)

上述(2 )式中之行列V中之各要素V ( i ),相對於 未知資料之多次元之特徵量v(i),爲藉由該叢集內之學 習資料之特徵量之平均値avg. ( i )和標準偏差std. ( i ) ,以上述(1 )式所求出的特徵量。η爲自由度,本實施形 態中,表示特徵量集合(後述)中特徵量之數量的特徵量 數。藉此,馬哈朗諾比斯平方距離爲加算η個被變換之特 徵量之差分的數値,藉由(馬哈朗諾比斯平方距離)/η, 母集團平均之單位距離成爲1。再者,VT爲將特徵量v(i )當作要素之行列V之轉置行列,R· 1 v爲叢集內之學習 資料中之各特徵量間的相關行列( correlateion matrix ) R -13- 200818060 之逆行列。 特徵量集合作成部1是於每個叢集算出上述距離計算 部於計算分類對象資料和各叢集之間的距離時使用的特徵 量集合,將算出結果對應於各叢集識別資訊,寫入至特徵 量集合記憶部4而予以記憶。 於算出特徵量集合時,特徵量集合作成部1是對每個 叢集,以屬於生成特徵量集合之對象叢集的學習資料之重 心向量(barycentric vctor),和屬於除該對象叢集之外 的所有其他叢集之學習資料之重心向量之距離爲基準,藉 由下式(3)計算判別基準(discriminant criterion)之値 λ。以下,將特徵量之組合當作特徵量集合予以說明。 λ = ω〇ωί(μ〇-μΐ)2/(ω〇σ〇2 + ωίσΐ2)...(3) 在上述(3)中,μ爲「屬於對象叢集之學習資料(叢 集內母集團)」之特徵量集合中之由特徵量之平均値所構 成之重心向量。σ,是藉由屬於該叢集內母集團之學習資料 之特徵量所生成之向量之標準偏差。⑴爲屬於叢集內母集 團之學習資料數對屬於全叢集之學習資料數之比率。再者 ,μ。爲「屬於對象叢集外之叢集的學習資料(對象叢集外 母集團)」之特徵量集合中之由特徵量之平均値所構成之 重心向量。σ。是藉由屬於該對象叢集外母集團之學習資料 之特徵量所生成之向量之標準偏差。ω。爲屬於叢集外母集 團之學習資料數對屬於全叢集之學習資料數之比率。在此 ,式(3 )中之,(μ。,,)即使使用log (對數)及平方根 之數値亦可。再者,在此計算各向量時,特徵量集合作成 -14- 200818060 部1是藉由式(1 ),計算每個特徵量被規格化之特徵量 。再者,即使設定事先運算比率ωι及ω◦之固有値以當作 分離變大的數値亦可。 然後,特徵量集合作成部1是每個對象叢集使用上述 式(3 ),對構成學習資料之特徵量中之任一者或所有組 合,計算出與其他叢集判別的上述判別基準値λ。以由大 至小之順序排列所計算之判別基準値λ ’輸出判別基準値 λ之順位清單。 在此,特徵量集合作成部1是將對應於最大判別基準 値λ之特徵量之組合,當作對象叢集之特徵量集合,與判 別基準値λ之値,同時對應於叢集之識別資訊,記憶至特 徵量集合記憶部4。 上述判別基準値λ之決定是如第2圖(a )所示般, 特徵量集合作成部1是於設定各叢集之特徵量集合之時, 學習資料及分類對象資料之特徵量爲a、b、c、d之4個 時,計算所有該4個特徵量全部、多數、其中任一個之一 個全組合的判別基準値λ。 然後,特徵量集合作成部1選擇最高之數値,例如在 第2圖(a)中選擇特徵量b、c之組合。 再者,以其他判別基準値λ之方法而言,有如第2圖 (b )所記載般之B S S法,即是運算使用分類對象資料之 集合所含之所有特徵量η個之判別基準値λ,接著,對於 自特徵量η個之集合取出η_ 1個之組合,運算判別基準値 λ。然後,自該η-1個之判別基準値λ選擇最大値之組合 -15- 200818060 ,下次自該n-l個之特徵量對n-2個所有組合,運算判別 基準値λ。如此一來,即使以依順位使特徵量一個一個自 集合減少,自減少之特徵量的集合,選擇又減少一個之組 合而運算判別基準値λ,選擇可以以較少特徵量數來判別 之組合之方式,構成特徵量集合作成部1亦可。 再者,又以其他判別基準値λ之方法而言,有如第2 圖(c )所記載般之FSS法,即是自分類對象資料之集合 所包含之特徵量η個一個一個讀出特徵量之全種類,運算 各特徵量之判別基準値λ,選擇該中具有最大判別基準値 之特徵量。接著,生成由該特徵量和該以外之特徵量的兩 個特徵量所構成之組合,計算相對於各個組合的判別基準 値λ。然後,自該組合中選擇具有最大判別基準値之組合 。接著,生成由該組合,和於該組合不含有之特徵的3個 特徵量所構成之組合,生成各個的判別基準値λ。如此一 來,即使以順序自跟前之特徵量的組合選擇具有最大判別 基準値λ之特徵量,使組合之特徵量,相對於組合增加i 個不存在於該組合之特徵量,計算增加後之組合的特徵量 之判別基準値λ,並運算增加1個不存在於該組合之特徵 量後的特徵量之組合之判別基準値λ,最後,自計算判別 基準値λ之所有組合,將判別基準値λ成爲最大之組合當 作特徵量集合而予以選擇之方式,構成特徵量集合作成部 1亦可。 接著,藉由判別基準値λ,藉由第3圖及第4圖表示 叢集所使用之特徵量集合之選擇的有效性。 -16- 200818060 第3圖是針對自特徵量a、b、c、d,抽出特徵量a及 g之組合’特徵量a及h之組合,和特徵量d及e之組合 ,以當作選擇特徵量集合之組合,自該些組合,在叢集1 、叢集2及叢集3,比起以往例,選擇具有高分類特性之 特徵量集合,予以說明。 在第3圖中,μΐ是對應於μ;,μ2是對應於μ。,σΐ是 對應於σ i,σ 2是對應於σ。,ω 1是對應於ω 1,ω 2是對應 於ω〇。 其中,在上述組合中,判別基準値λ之値最大爲特徵 量a及h之組合’將該組合用於叢集1和除此以外之叢集 之分離,藉由第4圖確認叢集1和除此以外之叢集(叢集 2及3 )之分類結果。 第4圖中,橫軸使用特徵量之組合而運算之馬哈朗諾 比斯距離之log之數値,縱軸是表示具有對應之數値的分 離對象資料之數(直方圖)。在此,橫軸之數値1.4是指 馬哈朗諾比斯距離之log之數値未滿1.4必且1.2以上( 1 .4左側之數値)。其他橫軸上之數値也相同。再者,在 第4圖中,表示1.4S是表示I·4以上之意。第4圖之馬 哈朗諾比斯距離是使用對應於叢集1之特徵量集合,對屬 於叢集1及除此以外之叢集的分類對象資料各予以計算者 〇 第4圖(a)爲使用特徵量a及g之組合而運算馬哈 朗諾比斯距離之例,第4圖(b)爲使用特徵量&及h之 組合而運算馬哈朗諾比斯距離之例,第4圖(C )爲使用 •17- 200818060 特徵量d及e之組合而運算馬哈朗諾比斯距離之例。當觀 看第4圖中之直方圖時,當判別基準値λ之數値大時,可 知良好執行叢集1和其他叢集之分類。 接著’參照第5圖及第6圖,說明第1圖之實施形態 所涉及之叢集系統之動作。第5圖是表示第1實施形態所 涉及之叢集之特徵量集合作成部i之動作例的流程圖,第 6圖是表示分類隊象資料之叢集之動作例的流程圖。 在以下之說明中,例如分類對象資料爲在玻璃物品具 有刮痕之特徵量的集合時,則自晝像處理或測量結果取得 當作該特徵量之「a :刮痕(scratch )之長度」、「b :刮 痕之面積」、「c :刮痕之寬度」、「d :含有刮痕部份之 特定區域之透過率」、「e ··含有刮痕之特定區域之反射 率」。因此,特徵量之集合(以下,爲特徵量集合)則成 爲{a、b、c、d}。再者,在本實施形態中,將叢集分析所 使用之距離當作使用規格化之特徵量的馬哈朗諾比斯距離 而予以算出。在此,本實施形態中之上述玻璃物品可舉出 板玻璃或顯示器用玻璃基板以作爲一例。 A ·特徵量集合作成處理(對應第5圖之流程圖) 使用者檢測出在玻璃上之刮痕,攝影該畫像而取得畫 像資料,並且藉由畫像處理執行自該畫像資料抽出刮痕部 份之長度測定等之特徵量,收集由上述特徵量之集合所構 成之特徵量資料。然後,對於使用者預以刮痕之發生原因 或形狀等來分類之各叢集,根據事先已知之發生原因或形 -18- 200818060 狀之資訊,將特徵量分成學習資料,當作各叢集之學習資 料之母集團,自無圖式之處理終端對應於叢集之識別資訊 ,記憶至叢集資料庫5 (步驟S 1 )。 接著,特徵量集合作成部1當自上述處理終端對各叢 集輸入生成特徵量集合之控制命令時,則自叢集資料庫5 對應各叢集之識別資訊,讀取學習資料之母集團。 然後,特徵量集合作成部1在每個叢集算出叢集內母 集團中之各特徵量之平均値及標準偏差,使用該平均値及 標準偏差,自(1 )式算出各學習資料中之被規格化的特 徵量。 接著,特徵量集合作成部1是對每個特徵量集合所含 之所有特徵量之組合的每個特徵量,藉由式(3 )算出判 別基準値λ。 此時,特徵量集合作成部1是每個叢集使用叢集內母 集團之被規格化之特徵量,算出由對應於各特徵量集合之 特徵量所構成之向量之平均値(重心向量)μι,由對應於 叢集內母集團中之特徵量集合之特徵量所構成之學習資料 之向量之標轉偏差h,使用叢集外母集團之被規格化的特 徵量,算出由對應於各特徵量之特徵量所構成之向量之平 均値(重心向量)μ。,和由對應於叢集外母集團中之特徵 量集合的特徵量所構成之學習資料之向量之標準偏差σ。, 和全學習資料數中之之叢集內母集團之學習資料數之比率 ωι,和全學習資料數中之叢集外母集團之學習資料數之比 率ω 〇 〇 -19- 200818060 然後,特徵量集合作成部1使 、標準偏差 σ i、σ。、比率 ω i、ω。 叢集,相對於特徵量所有組合之特 個叢集與其他叢集之距離的判別基 當所有判別基準値λ之計算結 部1是每個叢集以由大至小之順月 定對應於最大之判別基準値λ之特 之時,檢測出算出距離所使用之表 的特徵量集合(步驟S2 )。 接著特徵量集合作成部1因在 距離,故算出對應於各特徵量集合 R,和各叢集內母集團中之學習1 avg· (i)及標準偏差std. (i)(涉 接著,特徵量集合作成部1自 補正係數1(1/2)。該補正係數λ_(] 間之標準化。因依叢集不同,有些 會產生偏差,故爲了提升分類精度 間之標準化。再者,並不是以λ_(] 使用log ( λ),或是單純使用( 之函數,可執行特徵量集合間之標 亦可。 再者,在上述(3)式中,於 之特徵量集合中之重心向量μ。時 之任一者以算出對象叢集外母集團 [用上述重心向量μί、μ。 ,藉由式(3 ),在每個 徵量集合,計算判別每 準値λ。 ί束時,特徵量集合作成 ^排列判別基準値λ,判 〗徵量集合歸屬於各叢集 示特徵量之組合之集合 距離計算部3用於計算 之特徵量間之相關係數 聲料之特徵量之平均値 、驟 S 3 )。 上述判別基準値λ算出 [/2)爲取得各特徵量集合 叢集與其他叢集之距離 ,必須執行特徵量集合 ί/2)當作補正係數,即使 >-μί)亦可,若爲含有λ 準化者時即使爲任一者 算出對象叢集外母集團 ,選擇以下3個種類中 中之學習資料。 -20- 200818060 a. 全學習資料中之對象叢集外母集團之所有學習資 料 b. 對應於上述對象叢集外母集團中之分類之目的的 特定學習資料 c. 特徵量選擇所使用之學習資料中之對象叢集外母 集團之學習資料 在此,b.之分類目的是區別成明確與受到注目之叢集 有所差異,學習資料是使用欲賦予該差異之其他叢集所包 含之學習資料。 然後,特徵量集合作成部1對應於每個叢集之識別資 訊,將特徵量集合;對應於特徵量集合之補正係數,在本 實施形態中爲1(1/2)之値;逆行列;平均値avg. ( i) ;和標準偏差std. ( i )當作距離計算資料記憶至特徵量集 合記憶部4。 B .叢集處理(對應第6圖之流程圖) 當輸入分類對象資料時,特徵量抽出部2藉由各叢集 之識別訊號,自特徵量集合記憶部4讀出對應每個叢集之 特徵量集合。 然後,特徵量抽出部2對應於所讀出之特徵量集合中 特徵量之類別,自分類對象資料將特徵量抽出至每個叢集 ,對應於叢集之各個識別資訊,將所抽出之特徵量記憶於 內部記憶部(步驟S 1 1 )。 接著,距離計算部3是自特徵量集合記憶部4讀出對 -21 - 200818060 應於該特徵量之平均値avg· ( i)和標準偏差std ( i),藉 由執行上述式(2)之運算,將自分類對象資料特徵量予 以規格化,使記憶於內部記憶部之特徵量置換成規格化之 特徵量。 然後,距離計算部3生成由如上述般所取得之V ( i ) 之要素所構成之行列V,計算該型列V之轉置行列VT, 藉由式(3 ),順序計算分類對象資料和各叢集之間的馬 哈朗諾比斯距離,對應於各叢集之識別資訊,而記憶於內 部記憶部(步驟S 1 2 )。 接著,距離計算部3是對計算結果之上述馬哈朗諾比 斯距離,乘算對應於特徵量集合之補正係數λ·(1/2),求出 補正係數,各與馬哈朗諾比斯距離置換(步驟S 1 3 )。再 者,即使於乘算補正係數之時,於計算馬哈朗諾比斯距離 之log或是平方根之後亦可。 然後,距離計算部3是比較內部記憶部中之各叢集間 之補正距離(步驟S 1 4 ),檢測出最小之補正距離,將對 應於該補正距離之識別資訊之叢集,當作分類對象資料所 屬之叢集,對叢集庫5,對應分類地點之叢集之識別資訊 ,記憶所分類之分類對象資料(步驟S 1 5 )。 [第2實施形態] 上述第1實施形態雖然將執行叢集分析之時所使用之 特徵量集合,以每個叢集1種類予以說明,但是即使如以 下所說明之第2實施形態般,每個叢集多數設定特徵量, -22- 200818060 運算對應於各個特徵量集合之馬哈朗諾比斯距離,算出補 正距離,以由小至大之順序排列該補正距離,藉由上位之 特定順位以內之補正距離,因應事先所設定之規則,作爲 分類對象所屬之叢集亦可。 即是,本實施形態中之距離計算3’是藉由表示根據 在每個特徵量集合所取得之分類對象資料和各叢集之距離 中的該距離之順位設定之分類對象資料分類到各個叢集之 分類基準的規則模式,檢測出分類對象資料屬於哪一個叢 集。 以下,第2實施形態之構成是與第1圖所示之第1實 施形態相同,對各構成賦予相同之符號’使用第7圖僅說 明各構成中與第1實施形態不同之動作。在第2實施形態 中,有自學習資料設定上述規則模式之處理。第7圖是表 示相對於設定規則模式之距離順位的模式學習之動作例的 流程圖。第8圖及第9圖是表示第2實施形態中之叢集之 動作例的流程圖。 再者,在第1實施形態中,作成特徵量集合之時,特 徵量集合作成部1是在每個叢集’對當作特徵量之組合的 多數特徵量集合算出判別基準値λ ’將對應於多數所求出 之判別基準値λ之最大値的特徵量集合設定爲各叢集之特 徵量集合。 另外,在第2實施形態中,特徵量集合作成部1是在 每個叢集,對其他叢集之1個或多數組合或者其他所有叢 集,設定各對應於特徵量之組合數之特徵量集合之最大値 -23- 200818060 ,求出多數判別基準値λ,在每個叢集設定用以與其他叢 集分離之多數特徵量集合。 然後,特徵量集合作成部1是在每個特徵量集合求出 距離計算資料,對應於叢集之識別資訊,使多數特徵量之 特徵量和各特徵量集合之距離計算資料記憶於特徵量集合 記憶部4。 然後,第7圖中,當輸入學習資料時,特徵量抽出部 2藉由各叢集之識別訊號,自特徵量集合記憶部4讀出對 應於每個叢集之多數特徵量集合。 然後,特徵量抽出部2是對應於讀出之各特徵量集合 中之特徵量的類別,自學習資料抽出特徵量至每個叢集, 使對應於叢集之各識別資訊,於每個特徵量集合將所抽出 之特徵量記憶於內部記憶部(步驟S2 1 )。 接著,距離計算部3是自特徵量集合記憶部4,讀出 每個特徵量集合對應於該特徵量之平均値avg. ( i )和標 準偏差std. (i),藉由執行上述式(2)之運算,使自學 習資料所抽出之各特徵量予以規格化,使記憶於內部記億 部之特徵量置換成規格化之特徵量。 然後,距離計算部3生成由如上述般所取得之V ( i ) 之要素所構成之行列V,計算該型列V之轉置行列VT, 藉由式(3 ),順序計算學習資料和各叢集之間的馬哈朗 諾比斯距離,對應於各叢集之識別資訊,於每個特徵量集 合fe於內部I己憶部(步驟S22)。 接著,距離計算部3是對計算結果之上述馬哈朗諾比 -24- 200818060 斯距離,乘算對應於特徵量集合之補正係數λ^ι/2),求出 補正係數,各與馬哈朗諾比斯距離置換(步驟S23 )。 然後,距離計算部3是以由小至大之順序排列內部記 億部中之各叢集間的補正距離(排列成小的補正距離成爲 上位),即是以分類對象資料之補正距離爲小的叢集之識 別資訊成爲上位之順序予以排列(步驟S24 )。 接著,距離計算部3檢測出從小(上位)至第η號之 各補正距離的叢集之識別資訊,計數該η個所含之每個叢 集之識別資訊之數量,即是對各叢集執行投票。 然後’距離計算部3是檢測出各學習資料之各叢集之 識別資訊之計數數量之模式,爲與相同叢集所含之學習資 料共同之規則模式。例如,當將η設爲1 〇時,爲叢集Β 之學習資料時’檢測出叢集Α爲5個,叢集Β爲3個,叢 集C爲2個之計數數量之模式時,則將此設爲規則i。 再者’當爲叢集C之學習資料,檢測出3個叢集C時 ,即使叢集A爲7個,叢集B爲0個,不一定與叢集C 共通’叢集C之計數數量若爲3個以上,亦可與其他叢集 之計數數量無關係設定叢集C,將此設爲規則2。 再者’爲叢集A之學習資料,叢集A爲從上位佔第1 位及第2位的排列模式時,即使叢集β之計數數量爲8個 ,亦可與其他叢集之計數數量無關係設定叢集A,將此設 爲規則3。 如上述般’檢測出分類成相同叢集之各學習資料所具 有之各叢集之計數數量的規則性,在每個叢集之識別資訊 -25- 200818060 中,當作模式表記憶於內部。在此,規則即使在各叢集設 定1個亦可,多數設定亦可。再者,在上述說明中,雖然 距離計算部3抽出規則模式,但是使用者爲了改變分類至 各叢集之分類精度,即使任意設定計數數量或是規則模式 亦可。 依叢集不同,有些叢集特徵資訊之特性與其他叢集相 似者,由多數叢集之關聯性,即是各叢集之計數數量或是 來自上位之排列之模式的對象模式,也有執行分類對象資 料之分類之一方的精度比較高之情形,本實施形態則補全 該點。 接著,使用第8圖之流程圖,說明使用上述表所記述 之規則的第2實施形態之叢集分析之處理。 當輸入分類對象資料時,特徵量抽出部2藉由各叢集 之識別訊號,將對應於每個叢集之多數特徵量集合,自特 徵量集合記憶部4讀出。 然後,特徵量抽出部2是對應所讀出之各特徵量集合 中之特徵量的類別,自分類對象資料抽出特徵量至每個叢 集,使對應於叢集之各識別資訊,於每個特徵量集合將所 抽出之特徵量記憶於內部記憶部(步驟S3 1 )。 接著,距離計算部3是自特徵量集合記憶部4,讀出 每個特徵量集合對應於該特徵量之平均値avg· ( i )和標 準偏差std.(i),藉由執行上述式(2)之運算,使自分 類對象資料所抽出之各特徵量予以規格化,使記憶於內部 記憶部之特徵量置換成規格化之特徵量。 -26- 200818060 然後,距離計算部3生成由如上述般所取得之V ( i ) 之要素所構成之行列V,計算該型列V之轉置行列VT, 藉由式(3 ),順序計算學習資料和各叢集之間的馬哈朗 諾比斯距離,對應於各叢集之識別資訊,於每個特徵量集 合記憶於內部記憶部(步驟S32 )。 接著,距離計算部3是對計算結果之上述馬哈朗諾比 斯距離,乘算對應於特徵量集合之補正係數λ·(1/2),求出 補正係數,各與馬哈朗諾比斯距離置換(步驟S3 3 )。 然後,距離計算部3是以由小至大之順序排列內部記 憶部中之各叢集間的補正距離,即是以分類對象資料之補 正距離爲小的叢集之識別資訊成爲上位之順序予以排列( 步驟S 3 4 )。 排列之後,距離計算部3檢測出從小(上位)至第η 號之各補正距離的叢集之識別資訊,計數該η個所含之每 個叢集之識別資訊之數量,即是對各叢集執行投票。 接著,距離計算部3執行各分類對象資料之上位η個 中之相對於各叢集之計數數量之模式(或是排列之模式) ’是否存在於記憶於內部之表的對照處理(步驟S35 )。 然後,距離計算部3當檢測出上述對照之結果,與分 類對象資料之對象模式一致之規則模式記述於表時,則判 定該分類對象資料屬於對應於該一致之規則的識別資訊之 叢集’將分類對象資料分類在該叢集(步驟S36)。 再者’使用第9圖之流程圖,針對使用記述於上述表 之規則的第2實施形態之其他叢集之處理予以說明。 -27- 200818060 在該第9圖所示之其他叢集處理中,步驟S31至步驟 S 3 5之處理’是與第8圖所示之處理相同,距離計算部3 如同在步驟S 3 5所述般,執行自記憶於表之規則模式,與 分類對象資料之對象模式的對照處理。 然後’距離計算部3,在上述對照結果,檢測出是否 檢索到與上述對象模式一致之規則模式,於檢測出檢索到 一致之規則圖案時,則將處理移至步驟S 4 7,另外,當檢 測出無檢索到一致之規則圖案時,則將處理移行至步驟 S48(步驟 S46)。 當檢測出檢索到一致之規則圖案時,距離計算部3則 判定該分類對象資料屬於對應於該一致之規則的識別資訊 之叢集’將分類對象資料分類至該叢集,對叢集資料庫5 ’對應於分類地點之叢集之識別資訊,記憶所分類之分類 對象資料(步驟S47 )。 另外’當檢測出無檢索到一致之規則圖案時,距離計 算部3則檢測出計數數量,即是投票數最多之識別資訊, 分類對應於該識別資訊之叢集的分類對象資料。 然後’距離5十具部3是^彳叢集資料庫5,對應於歸屬 地點之叢集的識別資訊,記憶所分類的分類對象資料(步 驟 S 4 8 )。 [第3實施形態] 上述第2實施形態雖然是以準備與所計算的分類對象 資料之各叢集的距離從小(類似性爲大)至上位η個的規 -28- 200818060 則模式之表,藉由是否對應於存在於該表之規則模式’執 行各分類對象資料之叢集處理而予以說明’但是即使如以 下第3實施形態般’在每叢集多數設定特徵量集合,運算 對應於各個特徵量集合之馬哈朗諾比斯距離,算出補正距 離,將上位之特定順位以內之補正距離爲多之叢集,當作 分類對象資料所屬之叢集亦可。 以下,第3實施形態之構成是與第1圖所示之第1及 第2實施形態相同,將相同符號賦予各構成,使用第1 〇 圖僅說明與第2實施形態不同之動作。在第3實施形態中 ,並不是自學習資料設定上述規則之處理,而是直接執行 第9圖中之步驟S48。第10圖是表示第3實施形態中之 叢集的動作例之流程圖。 在該第10圖所示之其他叢集處理中,從步驟S31至 步驟S34之處理,是與第8圖所示之處理相同,距離計算 部3如先前所述,在步驟S 3 4中,以由小至大之順序,排 列內部記憶部中之各叢集間之補正距離,即是排列成分類 對象資料之補正距離爲小之叢集資訊成爲上位之順序(步 驟 S34)。 接著,距離計算部3檢測出對應於從小(上位)至第 η號爲止之各補正距離之叢集的識別資訊,計數該η個所 含之每個叢集之識別資訊之數量,即是對各叢集執行投票 處理(步驟S55)。 然後,距離計算部3是在投票結果中,檢測最多計數 値(投票數)之識別資訊,將對應於該識別資訊之叢集, -29- 200818060 當作分類對象資料所數之叢集,對叢集資料庫5 ’對應於 歸屬地點之叢集的識別資訊,記憶分類的分類對象資料( 步驟S 5 6 )。 再者,在距離計算部3對每個識別資訊設定使用者用 以預先篩選掉之投票數的臨界値,投票數最多之識別資訊 之投票數不滿該臨界値之時,即使執行也不屬於任何叢集 之處理亦可。 例如,對叢集A、B、C之3個叢集’將分類對象資 料予以分類之時,相對於叢集A之識別資訊的投票數爲5 個,相對於叢集B之識別資訊的投票數爲3個’相對於叢 集C之投票數爲2個時,最多投票數之識別資訊檢測出叢 集A和距離計算部3。 但是,當對叢集A設定上述臨界値6個時,距離計算 部3因相對於叢集A之識別資訊的投票數不滿臨界値,故 判定不屬於任何叢集。 依此,對於特徵量與其他叢集僅有些許差之叢集的叢 集分析中,則能夠對分類對象資料之叢集提升分類處理之 信賴性。 [特徵量之變換方法] 雖然期待各特徵量之母集團爲正規分布而執行叢集分 析’但是依特徵量之種類(面積、長度等)則有無形成正 規分布’母集團具有偏斜分布之情形,可想分類對象資料 和各叢集之間之距離計算,即是分類對象資料和各叢集之 -30- 200818060 類似性時之精度下降。 因此,依特徵量不同,有些特徵量需要執行藉由特定 方法來變換母集團之特徵量,使接近於正規分布而提升類 似性之判定精度。 作爲變換至該正規分布之變換方法,是由藉由含有藉 由log或平方根(/)、立方根(3/)等之η方根或階乘 或是數値計算求出特徵量之函數的運算式中之任一者而變 換。 以下,使用第1 1圖說明各特徵量之變換方法。第1 1 圖是表示各特徵量之變換方法之設定處理之動作例的流程 圖。並且,該變換方法是以叢集所含之各特徵量單位對每 叢集設定。再者,該變換方法之設定是使用屬於各叢集之 學習資料而執行。以下之處理雖然以特徵量集合作成部1 執行而予以說明,但是即使設置其他對應於該處理之處理 部亦可。 特徵量集合作成部1是以分類對象之叢集之識別資訊 作爲金鑰,自叢集資料庫讀出該叢集所含之學習資料,算 出(正規化處理)各學習資料之特徵量(步驟S6 1 )。 接著,特徵量集合作成部1藉由使用執行記憶於內部 之特徵量變換之運算式中之任一者,運運所讀出之上述各 學習資料,執行特徵量之變化(步驟S62)。 當結束所有學習資料之特徵量之變換時,特徵量集合 作成部1是算出表示以變換處理所取得之分布是否接近於 正規分布之評估値(步驟S63)。 -31 - 200818060 接著,特徵量集合作成部1是檢測在記憶在內部,即 是當作變換方法之事先所設定的所有運算式,是否算出評 估値,於檢測出在所有運算式算出有變換特徵量而所取得 之分布之評估値時,將處理前進至步驟S65,另外於檢測 出所有運算式還未結束特徵量之算出時,因執行下一個所 設定之運算式之處理,故將處理返回至步驟 S62 (步驟 S64 ) ° 當結束所有運算式執行之特徵量變換時,特徵量集合 作成部1檢測出在所設定之運算式中所取得之分布中評估 値爲最小之分布,即是最接近於正規分布之分布,決定用 以作成所檢測出之分布之運算式的變換方法,在內部設定 該叢集之特徵量之變換方法(步驟S65 )。 特徵量集合作成部1是對各叢集之每個特徵量執行上 述處理,對應於各個叢集之各特徵量而設定變換方法。 接著,使用第12圖說明上述步驟S63中之評估値的 計算。第1 2圖是說明藉由運算式所取得之分布之評估値 的處理之動作例的流程圖。 特徵量集合作成部1是藉由所設定之運算式變換屬於 對象叢集之各學習資料之特徵量(步驟S7 1 )。 變換所有學習資料之特徵量後,特徵量集合作成部1 算出以該變換後之特徵量所取得之分布(母集團)之平均 値μ及標準偏差σ (步驟S72)。 然後,特徵量集合作成部1是使用上述母集團之平均 値μ和標準偏差σ,由(x-μ) /σ算出ζ値(1 )(步驟 -32- 200818060 S73 ) 〇 接著’特徵量集合作成部1視算出上述母集團中之累 積確率(步驟S74 )。 算出後、特徵量集合作成部1是藉由母集團中之累積 確率’算出標準正規分布之累積分布函數之逆函數之値以 當作ζ値(2 )(步驟S 7 5 )。 然後’特徵量集合作成部1是求出特徵量之分布之兩 個ζ値,即是ζ値(1 )及ζ値(2 )之差,即是分布中之 兩個ζ値之誤差(步驟S76)。 當求出ζ値之誤差時,特徵量集合作成部1算出上述 兩個ζ値之誤差和,即是該誤差之總合(自乘和)以當作 評估値(步驟S77)。 上述兩個ζ値之誤差越小,分布越接近正規分布,若 無ζ値之誤差,則爲正規分布,另外,分布越脫離正規分 布,誤差越大。 接著,於執行第1至第3之實施形態中之叢集處理前 ,使用第1 3圖說明分類對象資料之特徵量。第1 3圖是分 類對象資料之特徵量資料之算出動作例的流程圖。 距離計算部3對應於對各叢集所設定之特徵量集合而 自輸入之分類對象資料抽出識別對象之特徵量,執行所有 說明之正規化處理(步驟S 8 1 )。 接著,距離計算部3是藉由對該叢集之特徵量所設定 之變換方法(運算式),變換使用於分類對象資料中用於 分類至分類對象之叢集的特徵量(步驟S82) ° -33- 200818060 然後’距離計算部3是如第1至第3實施形態所記載 般,算出分類對象之叢集的距離(步驟S83 )。 接著,距離計算部3是藉由對應於各叢集之特徵量而 所設定之變換方法,對分類對象之叢集所有,變換特徵量 ,藉由該變換之特徵量,執行檢測是否計算叢集之距離, 於對分類對象之所有叢集求出距離之時,則使處理前進至 步驟S 8 5,另外,於檢測出殘留有分類對象之叢集時,則 將處理返回至步驟S82 (步驟S84 )。 然後’在弟1至弟3貫施形態之各個中,開始執行自 計算距離結束之時點的處理(步驟S 8 5 )。 藉由上述理由,在本實施形態中所使用之馬哈朗諾比 斯距離中,求出分類對象資料和各叢集之間的距離之時, 因期待特徵量爲正規分布,故母集團之各特徵量之分布越 接近於正規分布,則可以求出各叢集之間正確之距離(類 似性),可以期待對各叢集分類之精度。 實施例 [計算例] 接著,使用第1、第2及第3實施形態之叢集系統, 藉由第1 4圖所示之樣本資料,確認與以往例之分類精度 。可知雖然樣本數量少,但不管所使用之特徵量少,亦取 得以往例或該以上之正確率。在該第1 4圖中,對當作各 個叢集之種類1、種2及種類3,定義10個學習資料,具 有各學習資料爲特徵量a、b、c、d、e、f、g、h之8個。 -34- 200818060 在該例中’自屬於第1 4圖所示之各叢集之學習資料決定 叢集所使用之特徵量集合,接著,同樣使用學習集合以當 作分類對象資料,執行叢集分析。 計算結果是第1 5圖爲以往之計算手法,使用當作特 徵量之組合的特徵量a及g,對叢集1至叢集3之第14圖 所示之各學習資料,運算馬哈朗諾比斯距離,表示判定結 果。在第15圖(a)中,Cluster之列爲與叢集1的馬哈 朗諾比斯距離,Cluster2之列爲與叢集2的馬哈朗諾比斯 距離,Cluster3之列爲與叢集3的馬哈朗諾比斯距離。再 者’種類之列實際上表示各學習資料所屬之叢集,判定結 果表示學習資料和馬哈朗諾比斯距離爲最小之叢集。表示 正確分類種類和判定結果之數字爲一致之特徵量資料。 在第15圖(b )中,列之號碼表示學習資料時計所屬 之叢集,表示判定行之號碼。例如,標記R1之「8」是將 叢集1之1 〇個叢集的內8個判定爲叢集1,標記R2之「 2」是將叢集1之10個叢集的內2個判定爲叢集3。p0是 表示表示正解和回答的一致率,pl是表示兩者偶然爲一致 之確率,k爲全體補正判定率,藉由以下之式求出。表示 該k越高分類精度越高。 k = (p0-p 1)/(1 -p 1) p0 = (a + d)/(a + b + c + d) pl = [(a + b) · (a + c) · (b + d) · (c + d)] · (a + b + c + d)2 使用第16圖說明上述式中之a、b、c、d之關係。 屬於叢集1之資料被當作叢集1分類之數量爲a,屬 -35- 200818060 於叢集1之資料被當作叢集2分類之數量爲b’ a + b是表 示屬於叢集1之資料數量。再者,同樣屬於叢集2之資料 被當作叢集2分類之數量爲d,同樣屬於叢集2之資料被 當作叢集1分類之數量爲c,表示c + d屬於叢集2之資料 數。a + c爲被分類至全資料a + b + c + d內之叢集1的數量’ b + d在全資料a + b + c + d被分類至叢集b之數量。 接著,第1 7圖使用第1實施形態之計算手法,對叢 集1至叢集3之第14圖所示之各學習資料,運算馬哈朗 諾比斯距離,表示判定結果。針對該第1 7圖(a )及(b ),因與第1 5圖相同,故省略該說明。可知正確率p0、 偶然一致之確率P 1、全體補正判定率k是與第1 5圖之以 往計算手法相等。在此,使用自上述全體之組合中,選擇 每個叢集具有最大判別基準値λ之組合的方法,算出對應 於各叢集之特徵量集合。當作對應於叢集1之特徵量集合 ,是使用特徵量a及h之組合,當作對應於叢集2之特徵 量集合是使用特徵量a及d,當作對應於叢集3之特徵量 集合是使用特徵量a、g之組合。 接著,第1 8圖使用第2實施形態之計算手法,對叢 集1至叢集3之第14圖所示之各學習資料,運算馬哈朗 諾比斯距離,表示判定結果。針對觀看該第1 8圖(a )及 (b )之方法,因與第1 5圖相同,故省略說明。正確率p 〇 爲0.8 3 3 3,偶然一致的確率p 1爲〇 · 3 3 3 3,全體補正判定 率k爲0.75,可知比起第15圖之以往計算之手法,提升 分類精度。在此,使用自上述全體之組合中,選擇在每個 -36- 200818060 叢集具有至上位第3號之判別基準値λ之方法,算出對各 叢集之特徵量集合。使用特徵量a、h、a、g、d、e之3 個組合,以當作對應於叢集1之特徵量集合,使用特徵量 a、f、a、d、a、b之3個組合以當作對應於叢集2之特徵 量集合,使用特徵量e、g、a、c、a、g之3個組合當作 對應於叢集3之特徵量。 再者,投票之判定,是由馬哈朗諾比斯距離少的順序 排列,計算由少的進入至第3號的叢集之數量,將最多數 量之叢集當作該分類對象資料所屬之叢集。 接著,第1 9圖使用第2實施形態之計算手法,對叢 集1至叢集3之第14圖所示之各學習資料,運算馬哈朗 諾比斯距離,並且對計算結果之馬哈朗諾比斯距離乘算補 正係數(λ)_1/2之後,執行距離之順位排列,表示判定結果 。針對該觀看該第19圖(a)及(b)之方法,因與第15 圖相同,故省略該說明。正確率p〇爲0.8 3 3 3,偶然一致 的確率pi爲0.3 3 3 3,全體補正判定率k爲0· 75,可知比 起第1 5圖之以往計算之手法,提升分類精度。在此,使 用自上述全體之組合中,選擇在每個叢集具有至上位第3 號之判別基準値λ之方法,算出對各叢集之特徵量集合。 使用特徵量a、h、a、g、d、e之3個組合,以當作對應 於叢集1之特徵量集合,使用特徵量a、f、a、d、a、b之 3個組合以當作對應於叢集2之特徵量集合,使用特徵量 e、g、a、c、a、g之3個組合當作對應於叢集3之特徵量 -37- 200818060 再者,投票之判定,是由馬哈朗諾比斯距離少的順序 排列,計算由少的進入至第3號的叢集之數量,將最多數 量之叢集當作該分類對象資料所屬之叢集。 由上述第1 5、1 7、1 8、19圖所示之各分類結果,判 斷本實施形態比起以往例,執行高速且高精度之叢集處理 ,確認出本實施形態相對於以往例之優越性。 [本發明之應用例] A.檢查裝置 說明如第2 0圖所示分類被檢查物,例如玻璃基板表 面之刮痕種類之檢查裝置(缺陷檢查裝置)。第2 1圖爲 說明特徵量集合之選擇之動作例的流程圖,第2 2圖爲說 明叢集處理中之動作例的流程圖。 首先,針對特徵量集合之選擇動作予以說明。第5圖 之流程圖中之步驟S 1中之學習資料之收集’是對應於第 2 1圖之流程圖之步驟S 1 0 1至步驟S 1 0 5。 第21圖之步驟S2至步驟S4因與第5圖之流程圖相 同,故省略說明。 藉由操作器之操作,收集各對應於於欲分類刮痕種類 之叢集之學習資料用之樣本(步驟S 1 0 1 ) ° 以照明裝置1 〇 2照射畫像取得部1 0 1當作學習資料所 收集之刮痕之形狀,藉由攝影裝置1 03取得刮痕部份之畫 像資料(步驟S102 )。 然後,自畫像取得部1 0 1所取得之畫像資料算出各學 -38- 200818060 習資料之刮痕之特徵量(步驟S丨03}。 將所取得之學習資料之特徵量各分類至以目視所取得 之分類地點’執行各叢集中之學習資料之特定(步驟 S 1 04 ) ° 然後,各叢集之學習資料成爲特定數(事先設定之樣 本數)’例如成爲各30個左右,重複自步驟si〇i至步驟 S 1 0 2爲止之處理’當成爲特定數時,叢集部丨〇 2執行所 有第5圖中所說明之步驟S 2以後之處理。在此,叢集部 1 〇 5爲第1或是第2實施形態中之叢集系統。接著,參照 第22圖,說明第4圖之檢查裝中之叢集處理。在此,因 第22圖之步驟S31至步驟S34、S55及S56與第10圖之 流程圖相同,故省略說明。 在第20圖之檢查裝置中,當開始檢查時,對爲被檢 查物1 0 0之玻璃基板,照明裝置1 〇 2執行照明,攝影裝置 1 〇 3攝影玻璃基板表面而將該攝影畫像輸出至畫像取得部 1 〇 1。依此,缺陷候補檢測部1 04在自畫像取得部所輸入 之攝影畫像中,檢測出與平面形狀不同之部份,設爲應分 類此之缺陷候補(步驟S201 )。 接著,曲線候補檢測部1 04是自攝影畫像剪出該缺陷 候補部份之畫像資料以當作分類對象資料。 然後,缺陷候補檢測部1 04是自分類對象資料之畫象 資料算出特徵量,對叢集部105輸出由所抽出之特徵量之 集合所構成之分類對象資料(步驟S2 02 )。 針對之後的叢集處理,因在第1 〇圖之步驟中已說明 -39- 200818060 ,故省略。如上述般,本發明之檢查裝置可以 存在於玻璃基板上之刮痕分類成刮痕之每個種 B .缺陷種類判定裝置 第23圖所示之缺陷種類判定裝置,叢集丨 於既已說明之本發明之叢集系統。 畫像取得裝置201是由第20圖中之畫像耳】 照明裝置102及攝影裝置103所構成。 已取得將分類對象資料予以分類之地點的 習資料,被準備於叢集裝置105之叢集資料庫 也結束第5圖中之特徵量集合。 自被安裝在各製造裝置上之畫像取得裝置 之攝影裝置檢測出缺陷候補,切取該畫像資料 量而輸出至資料收集裝置203。控制裝置200 資料收集裝置203之分類對象資料轉送至叢集 後,如已說明般,叢集部1 05是對對應於刮痕 叢集分類所輸入之分類對象資料。 C·製造管理裝置 本發明之製造管理裝置是如第24圖所示 置300、製造裝置3()1、3〇2、告知部303、記 狀況不佳裝置判定部3 05及缺陷類別判定裝置 。在此,曲線類別判定裝置3 〇6是與上述B項 陷類別判定裝置相同。 以高精度將 類。 形105對應 :得部2 0 1、 各叢集之學 5。因此, 202所輸入 ,抽出特徵 使被輸入至 部1 〇 5。然 之種類的各 般,控制裝 錄部3 04、 3 06所構成 所說明之缺 -40- 200818060 缺陷種類判定裝置306是在所對應之缺陷候補 104中,將來自各設置在製造裝置3〇1及製造裝置 之畫象取得裝置201、202的攝影畫像予以畫像處 抽出特徵量,執行分類對象資料之分類。 接著,狀況不佳裝置判定部3 05具有表示被分 集之識別資訊,和對應於該叢集之發生要因之關係 自上述表讀出自上述缺陷種類判定裝置3 0 6輸入之 點之叢集之識別資訊的發生要因,判定成爲發生要 造裝置。即是,狀況不佳裝置判定部3 0 5是對應於 識別資訊,檢測出製品之製程中之缺陷的產生要因 然後,狀況不佳裝置判定部3 06由告知部303 操作器,並且於記錄部3 04對應於所判定之日時, 缺陷分類之叢集之識別號碼、發生原因、和該製造 識別資訊以當作履歷。再者,控制裝置3 00是停止 佳裝置判定部3 05判定之製造裝置,控制控制參數 D.製造管理裝置 本發明之其他製造管理裝置是如第25圖所示 控制裝置3 00、製造裝置301、3 02、告知部3 03、 3 04及叢集部105所構成。在此,叢集部1〇5是與 、:B項中所說明之構成相同。 叢集部105是與上述A至C之情形不同,藉由 象資料之特徵資料是藉由例如玻璃基板之製造過程 造條件(材料之分量、處理溫度、壓力、處理速度 檢測部 3 02上 理,而 類之叢 的表, 分類地 因之製 叢集之 〇 通知至 記憶被 裝置之 狀況不 般,由 記錄部 上述A 分類對 中之製 等)所 -41 - 200818060 構成之特徵量,按製程之各工程的製造狀態予 述特徵量是以被設置在各製造裝置301或302 檢測出之工程資訊被輸入至叢集部1 1 5以當作 即是,叢集部105是藉由上述分類對項資 ,將各製造裝置之各工程中的玻璃製造過程之 分類成「正常狀態」、「容易發生缺陷需要調 、「危險而需要調整之狀態」等之叢集。然 105藉由告知部3 03將上述分類結果通知至操 將分類結果之叢集的識別資訊輸出至控制裝置 ,使記錄部3 04對應於所判定之日時,記憶當 述各工程之製造狀態的分類識別號碼;最成爲 量的製造條件;和該製造裝置之識別資訊。 控制裝置3 0 0具有表示使叢集之識別資訊 返回正常之調整項目及該資料之對象的表,讀 叢集部1 05所輸入之叢集識別資訊,使製造條 之調整項目及該資料,藉由所讀出之資料控制 裝置。 並且,即使將用以實現第1圖中之叢集系 程式記錄於電腦可讀取之記錄媒體,使電腦系 於該記錄媒體之程式,藉由實行執行分類對項 處理亦可。並且,在此所指「電腦系統」包含 機器等之硬體。再者,「電腦系統」也包含具 供環境(或是顯示環境)之WWW系統。再者 讀取之記錄媒體」是指軟碟、光磁碟、ROM、 以分類。上 之感測器所 特徵量。 料之特徵量 製造狀態, 整之狀態」 後,叢集部 作器,並且 3 00,再者 作履歷之上 問題之特徵 和製造條件 出對應於自 件返回正常 對應的製造 統之功能的 統讀入記錄 資料之叢集 OS或周邊 備有網頁提 ,「電腦可 CD-ROM 等 -42- 200818060 之可搬媒體、內藏於電腦系統之硬碟等之記憶裝置。並且 ,「電腦可讀取之記錄媒體」爲含有如經網際網路等之網 絡或電話回線之通訊回線發送程式之時成爲伺服器或客戶 之電腦系統內部之揮發性記憶體(RAM )般,以一定時間 保持程式者。 再者,上述程式即使自在記憶裝置等儲存該程式之電 腦系統,經傳送媒體或是藉由傳送媒體中之傳送波傳送至 其他電腦系統亦可。在此,傳送程式之「傳送媒體」是如 網際網路等之網絡(通訊網)或電話回線等之通訊回線( 通訊線)般,具有傳送資訊之功能的媒體。再者,上述程 式即使爲用以實現上述功能之一部份者亦可。並且,即使 爲可以藉由與既已記錄於電腦系統之程式的組合來實現上 述功能者,或是爲差分檔案(差分程式)亦可。 [產業上之利用可行性] 本發明是可以運用於如玻璃物品等之缺點檢測等般以 高精度分類判別具有多種類之特徵量的資訊,並且亦可以 利用於製造狀態檢測裝置或製品製造管理裝置。 【圖式簡單說明】 第1圖是表示本發明之第1及第2實施形態之叢集系 統之構成例的方塊圖。 第2圖是說明對於藉由判別基準値r選擇特徵集合之 處理的表。 -43- 200818060 第3圖是說明對於藉由判別基準値r選擇特徵集合之 處理的表。 第4圖是說明對於藉由判別基準値r選擇特徵集合之 效果的直方圖。 第5圖是表示藉由第1實施形態對各叢集選擇特徵量 集合的處理之動作例的流程圖。 第6圖是表不藉由第1實施形態對分類對象資料執行 叢集處理之動作例的流程圖。 第7圖是表示生成第2實施形態中叢集處理所使用之 規則模式之表的動作例之圖式。 第8圖是表示藉由第2實施形態對分類對象資料執行 叢集處理之動作例的圖式。 第9圖是表示藉由第2實施形態對分類對象資料執行 其他叢集處理之動作例的圖式。 第1 0圖是表示藉由第3實施形態對分類對象資料執 行叢集處理之動作例的流程圖。 第11圖是表示設定當作特徵量之變換方法的運算式 之動作例的流程圖。 第1 2圖是表不弟1 1圖之流程圖中算出評估値之動作 例的流程圖。 第1 3 0是表不算出使用所設定之變換方法所變換出 之特徵量的距離之動作例的流程圖。 第14圖是表示屬於各叢集之學習資料的表。 第15圖是表示將第14圖之學習資料藉由以往例之叢 -44- 200818060 集方法予以分類之結果的結果表。 第1 6圖是說明全體補正判定率之算出方法的槪念圖 〇 第1 7圖是表示藉由第1實施形態中之叢集系統分類 第1 4圖之學習資料之結果的結果表。 第1 8圖是表示藉由第2實施形態中之叢集系統分類 第1 4圖之學習資料的結果表。 第1 9圖是表示藉由第2實施形態中之叢集系統分類 第1 4圖之學習資料之結果表。 第20圖是表示使用本發明之叢集系統之檢查裝置之 構成例的方塊圖。 第21圖是表示第20圖之檢查裝置中之特徵量集合之 選擇動作例之流程圖。 第22圖是表示第20圖之檢查裝置中之叢集處理之動 作例的流程圖。 第23圖是表示使用本發明之叢集系統之缺陷種類判 定裝置之構成立的方塊圖。 第24圖是表示使用本發明之叢集系統之製造管理裝 置之構成例的方塊圖。 第25圖是表示使用本發明之叢集系統之其他製造管 理裝置之構成例的方塊圖。 【主要元件符號說明】 1 :特徵量集合作成部 -45- 200818060 2 :特徵量抽出部 3 :距離計算部 4 :特徵量集合記憶部 5 :叢集資料庫 1 〇 〇 :被檢查物 1 〇 1 :畫像取得部 102 :照明裝置 1 0 3 .攝像裝置 104 :缺陷候補檢測部 105 :叢集部 2 0 0、3 0 0 :控制裝置 201、202 :畫像取得裝置 301、3 02 :製造裝置 3 0 3 :告知部 3 04 :記錄部 -46-The feature quantity v(i) of each element V in the rank V in the above formula (2) relative to the multiple elements of the unknown data is the average 値avg of the feature quantity by the learning data in the cluster.  (i) and standard deviation std.  (i) The feature quantity obtained by the above formula (1). η is a degree of freedom, and in the present embodiment, the number of feature quantities indicating the number of feature quantities in the feature amount set (described later). Thereby, the Mahalanobis squared distance is the number 加 of the difference of the η transformed characteristic quantities, and the unit distance of the parent group average becomes 1 by (Marananobis squared distance) / η. Furthermore, VT is the transposed rank of the rank V of the feature quantity v(i), and R·1 v is the correlatione matrix between the feature quantities in the learning data in the cluster R -13- The retrograde column of 200818060. The feature amount set cooperation unit 1 calculates a feature amount set used when the distance calculation unit calculates a distance between the classification target data and each cluster in each cluster, and writes the calculation result to each cluster identification information, and writes the feature amount to the feature amount. The memory unit 4 is stored and memorized. When calculating the feature quantity set, the feature quantity set cooperation part 1 is a barycentric vctor of the learning material belonging to the object cluster of the generated feature quantity set for each cluster, and belongs to all other than the object cluster. The distance of the center of gravity vector of the learning data of the cluster is used as a reference, and 値λ of the discriminant criterion is calculated by the following equation (3). Hereinafter, the combination of the feature amounts will be described as a feature amount set. λ = ω〇ωί(μ〇-μΐ)2/(ω〇σ〇2 + ωίσΐ2). . . (3) In the above (3), μ is a centroid vector consisting of the average 特征 of the feature quantities in the feature quantity set of the learning material belonging to the target cluster (the parent group in the cluster). σ is the standard deviation of the vector generated by the feature quantities of the learning materials belonging to the parent group within the cluster. (1) The ratio of the number of learning materials belonging to the parent group in the cluster to the number of learning materials belonging to the full cluster. Again, μ. The center of gravity vector consisting of the average of the feature quantities in the feature quantity set of the learning material (object cluster outer group) belonging to the cluster outside the cluster. σ. It is the standard deviation of the vector generated by the feature quantity of the learning material belonging to the external cluster of the object cluster. ω. The ratio of the number of learning materials belonging to the cluster outer group to the number of learning materials belonging to the full cluster. Here, in the formula (3), (μ.,.) can be used even if the log (logarithm) and the square root number are used. Furthermore, when calculating each vector, the feature quantity set cooperates as -14- 200818060. Section 1 calculates the feature quantity normalized for each feature quantity by the equation (1). Further, even if the inherent 値 of the ratios ωι and ω◦ are set in advance, the number of separations becomes large. Then, the feature amount set cooperation unit 1 calculates the discrimination criterion 値λ which is discriminated from the other clusters by using the above equation (3) for each of the target clusters and for any or all of the feature quantities constituting the learning material. The ranking list of the discrimination reference 値 λ is outputted by arranging the calculated discrimination reference 値 λ ' in descending order. Here, the feature quantity set cooperation unit 1 is a combination of the feature quantities corresponding to the maximum discrimination reference 値λ as the feature quantity set of the target cluster, and the discrimination reference 値λ, and corresponding to the identification information of the cluster, and the memory The feature quantity set storage unit 4. The determination criterion 値λ is determined as shown in Fig. 2(a). When the feature amount set cooperation unit 1 sets the feature quantity set of each cluster, the feature quantities of the learning data and the classification target data are a and b. When four of c, d and d are used, the discriminant reference 値λ of all of the four feature quantities, all of them, and one of them is calculated. Then, the feature quantity set cooperation unit 1 selects the highest number 値, for example, the combination of the feature quantities b and c is selected in Fig. 2(a). Further, in the method of the other criterion 値λ, there is a BSS method as described in FIG. 2(b), that is, a discrimination criterion 値λ for calculating all the feature amounts η included in the set of classification target data. Next, the combination of η_1 is extracted from the set of θ of the feature quantities, and the discrimination criterion 値λ is calculated. Then, the combination of the maximum 値 -15-200818060 is selected from the η-1 discriminant reference 値λ, and the reference 値λ is calculated next time from the n-l feature amounts to n-2 all combinations. In this way, even if the feature quantity is reduced by the self-collection by the order, the set of the feature quantity of the self-reduction is selected, and the combination is reduced by one, and the criterion 値λ is calculated, and the combination can be determined by the less feature quantity. In this manner, the feature quantity set cooperation unit 1 may be configured. Furthermore, in the method of the other criterion 値λ, the FSS method as described in FIG. 2(c) is a feature quantity η one by one read feature included in the set of self-classification target data. For all types, the discrimination criterion 値λ of each feature amount is calculated, and the feature amount having the largest discrimination reference 値 is selected. Next, a combination of the feature amounts and the two feature amounts of the other feature amounts is generated, and the discrimination reference 値λ with respect to each combination is calculated. Then, the combination having the largest discriminant reference 値 is selected from the combination. Next, a combination of the three feature quantities of the combination and the feature not included in the combination is generated, and each discrimination criterion 値λ is generated. In this way, even if the feature quantity having the largest discriminant reference 値λ is selected in the order from the combination of the feature quantities before the order, the feature quantity of the combination is increased by i with respect to the combination, and the feature quantity that does not exist in the combination is calculated. The discriminant reference 値λ of the combined feature quantity is calculated, and the discriminant reference 値λ of the combination of the feature quantities which are not present in the feature quantity of the combination is calculated, and finally, all combinations of the discrimination reference 値λ are calculated, and the criterion is determined. It is also possible to form the feature quantity set cooperation unit 1 in such a manner that the largest combination of 値λ is selected as the feature quantity set. Next, by discriminating the reference 値λ, the validity of the selection of the feature quantity set used in the cluster is represented by the third and fourth figures. -16- 200818060 Fig. 3 is a combination of the feature quantities a and h combined with the feature quantities a, b, c, d, and the combination of the feature quantities d and e, as a selection The combination of the feature quantity sets, from these combinations, in the cluster 1 , the cluster 2 , and the cluster 3 , a set of feature quantities having high classification characteristics is selected as compared with the conventional example. In Fig. 3, μΐ corresponds to μ; and μ2 corresponds to μ. , σ ΐ corresponds to σ i , and σ 2 corresponds to σ. , ω 1 corresponds to ω 1, and ω 2 corresponds to ω 〇. Wherein, in the above combination, the criterion 値λ is the largest combination of the feature quantities a and h', and the combination is used for the separation of the cluster 1 and the other clusters, and the cluster 1 is confirmed by the fourth graph and Classification results for clusters other than clusters (clusters 2 and 3). In Fig. 4, the horizontal axis represents the number of log of the Mahalanobis distance calculated using the combination of the feature quantities, and the vertical axis represents the number of the separated object data (histogram) having the corresponding number 値. Here, the number of the horizontal axis is 値1. 4 means that the number of logs of Mahalanobis is less than 1. 4 must and 1. 2 or more (1. 4 on the left side 値). The numbers on the other horizontal axes are also the same. Furthermore, in Figure 4, it indicates 1. 4S means that I·4 or more. The Mahalanobis distance in Fig. 4 is calculated using the feature quantity set corresponding to cluster 1, and the classification object data belonging to cluster 1 and other clusters are calculated. Fig. 4 (a) is a use feature An example of calculating the Mahalanobis distance by the combination of the quantities a and g, and (b) is an example of calculating the Mahalanobis distance using the combination of the feature quantities & and h, FIG. 4 ( C) An example of calculating the Mahalanobis distance using a combination of the characteristic quantities d and e of the 17-200818060. When viewing the histogram in Fig. 4, when the number of discriminations 値λ is large, it is known that the classification of cluster 1 and other clusters is well performed. Next, the operation of the cluster system according to the embodiment of Fig. 1 will be described with reference to Figs. 5 and 6 . Fig. 5 is a flowchart showing an operation example of the feature quantity set cooperation unit i of the cluster according to the first embodiment, and Fig. 6 is a flowchart showing an operation example of the cluster of the classification team image data. In the following description, for example, when the classification target data is a set of the feature amount of the scratch on the glass article, "a: the length of the scratch" which is the feature amount is obtained from the image processing or the measurement result. "b: the area of the scratch", "c: the width of the scratch", "d: the transmittance of the specific region containing the scratched portion", and "e · the reflectance of the specific region containing the scratch". Therefore, the set of feature quantities (hereinafter, the feature quantity set) becomes {a, b, c, d}. Furthermore, in the present embodiment, the distance used for the cluster analysis is calculated as the Mahalanobis distance using the normalized feature amount. Here, the glass article in the present embodiment may be, for example, a plate glass or a glass substrate for a display. A. Feature amount set cooperation processing (corresponding to the flowchart of Fig. 5) The user detects a scratch on the glass, photographs the image to obtain image data, and performs image processing to extract a scratch portion from the image data. The feature amount such as the length measurement is used to collect the feature amount data composed of the above-described feature amount. Then, for each cluster classified by the user in the cause or shape of the scratches, the feature amount is divided into learning materials according to the known cause or the information of the shape, and the learning is performed as a cluster. The parent group of the data, from the non-graphic processing terminal corresponding to the identification information of the cluster, is memorized to the cluster database 5 (step S1). Next, when the feature amount set cooperation unit 1 inputs a control command for generating a feature amount set to each cluster from the processing terminal, the cluster data library 5 reads the identification information of each cluster and reads the parent group of the learning materials. Then, the feature quantity set cooperation unit 1 calculates the average 値 and standard deviation of each feature quantity in the parent group in the cluster for each cluster, and uses the average 値 and the standard deviation to calculate the specifications in each learning material from the formula (1). The amount of features. Next, the feature amount set cooperation unit 1 is for each feature amount of a combination of all the feature amounts included in each feature amount set, and the discrimination reference 値λ is calculated by the equation (3). In this case, the feature quantity set cooperation unit 1 calculates the normalized 値 (center of gravity vector) μι of the vector composed of the feature amounts corresponding to the respective feature amount sets, using the normalized feature quantity of the parent group in the cluster for each cluster. Using the normalized feature quantity of the vector of the cluster external group, the characteristic deviation amount of the vector of the learning material which is composed of the feature quantity of the feature quantity set in the parent group in the cluster is calculated, and the feature corresponding to each feature quantity is calculated. The average 値 (center of gravity vector) μ of the vector formed by the quantity. And a standard deviation σ of a vector of learning materials composed of feature quantities corresponding to a set of feature quantities in the cluster outer group. , the ratio of the number of learning materials of the parent group in the cluster of the total number of learning materials ωι, and the ratio of the number of learning materials of the cluster of the external learning group in the total number of learning materials ω 〇〇-19- 200818060 Then, the feature quantity set The preparation unit 1 makes the standard deviations σ i and σ. , ratio ω i, ω. Cluster, the discriminant of the distance between the special cluster and all the clusters of all combinations of feature quantities. When all the discriminant fiducials 値λ are calculated, the knot 1 is the cluster of each cluster from the largest to the smallest. When 値λ is specified, the feature amount set of the table used for calculating the distance is detected (step S2). Then, the feature quantity set cooperation unit 1 calculates the set of feature amounts R corresponding to each feature quantity set R, and the learning 1 avg· (i) and standard deviation std in the parent group in each cluster.  (i) (In the following, the feature quantity set cooperation unit 1 self-correction coefficient 1 (1/2). The normalization between the correction coefficients λ_(]. Because of the different clusters, some deviations occur, so in order to improve the classification accuracy Standardization. Furthermore, instead of using λ_(] to use log ( λ), or simply using a function, the standard between the set of feature quantities can be performed. Furthermore, in the above formula (3), the feature is The centroid vector μ in the quantity set is used to calculate the outer cluster of the target cluster [using the above-mentioned center of gravity vector μί, μ. , by equation (3), in each eigenvalue set, the discriminant of each quasi-値 λ In the case of the ί bundle, the feature quantity set cooperates to form the discriminant reference 値λ, and the eigenvalue set is attributed to the combination of the clustered feature quantities, and the feature of the correlation coefficient of the calculated feature quantity is calculated. The average value of the quantity 骤, step S 3 ). The above-mentioned discriminant criterion 値λ calculates [/2) to obtain the distance between each feature quantity set cluster and other clusters, and it is necessary to execute the feature quantity set ί/2) as the correction coefficient, even if > -μί) Also, if it is a λ-qualifier Is calculated by either parent objects outside the cluster group, they select the following three categories in the learning materials. -20- 200818060 a.  All the learning materials of the external parent group of the objects in the full learning materials b.  Specific learning materials corresponding to the purpose of the classification in the parent group of the above-mentioned object cluster c.  The collection of the objects in the learning materials used for the feature quantity selection. The learning materials of the external parent group. Here, b. The purpose of the classification is to distinguish it from a clear and conspicuous cluster, and the learning material is the learning material contained in other clusters that are intended to confer the difference. Then, the feature quantity set cooperation unit 1 corresponds to the identification information of each cluster, and sets the feature quantity; the correction coefficient corresponding to the feature quantity set is 1 (1/2) in the present embodiment; the inverse row; the average値avg.  (i) ; and standard deviation std.  (i) Memory is stored as the distance calculation data in the feature amount collecting unit 4. B. Cluster processing (corresponding to the flowchart of Fig. 6) When the classification target data is input, the feature amount extracting unit 2 reads out the feature amount set corresponding to each cluster from the feature amount set storage unit 4 by the identification signals of the clusters. Then, the feature quantity extracting unit 2 extracts the feature quantity from each of the classified object data to each cluster based on the type of the feature quantity in the read feature quantity set, and extracts the extracted feature quantity memory corresponding to each piece of identification information of the cluster. In the internal memory (step S 1 1 ). Next, the distance calculating unit 3 reads out the average 値avg·(i) and the standard deviation std(i) of the feature amount from 21 to 200818060 from the feature amount set storage unit 4, by executing the above formula (2) In the calculation, the feature quantity of the classification target data is normalized, and the feature quantity stored in the internal memory unit is replaced with the normalized feature quantity. Then, the distance calculating unit 3 generates the rank V composed of the elements of V(i) obtained as described above, calculates the transposed row VT of the type V, and sequentially calculates the classification target data and the equation (3). The Mahalanobis distance between the clusters corresponds to the identification information of each cluster, and is stored in the internal memory (step S1 2). Next, the distance calculating unit 3 calculates the Mahalanobis distance of the calculation result, multiplies the correction coefficient λ·(1/2) corresponding to the feature amount set, and obtains the correction coefficient, which is different from the Mahalano ratio. Distance replacement (step S 1 3 ). Furthermore, even when multiplying the correction coefficient, it is possible to calculate the log or square root of the Mahalanobis distance. Then, the distance calculating unit 3 compares the correction distances between the clusters in the internal memory unit (step S1 4), detects the smallest correction distance, and uses the cluster of the identification information corresponding to the correction distance as the classification target data. The cluster to which it belongs, for the cluster library 5, the identification information corresponding to the cluster of the classification points, and the classified object data classified (step S15). [Second Embodiment] In the first embodiment, the feature amount set used in the cluster analysis is described in each cluster type. However, even in the second embodiment described below, each cluster is used. Most of the set feature quantities, -22-200818060 operation corresponds to the Mahalanobis distance of each feature quantity set, and the correction distance is calculated, and the correction distance is arranged in the order of small to large, and the correction is corrected by the specific order of the upper position. The distance may be a cluster to which the classification object belongs, in accordance with the rules set in advance. That is, the distance calculation 3' in the present embodiment is classified into the respective clusters by the classification target data which is set based on the order of the distance between the classification target data acquired in each feature amount set and the distance of each cluster. The rule pattern of the classification criteria detects which cluster the classification object data belongs to. In the first embodiment, the configuration of the second embodiment is the same as that of the first embodiment shown in Fig. 1, and the same reference numerals are given to the respective configurations. Only the operation different from the first embodiment in each configuration will be described using Fig. 7 . In the second embodiment, there is a process of setting the rule mode by the self-learning material. Fig. 7 is a flow chart showing an example of the operation of mode learning with respect to the distance rule of the set rule mode. Fig. 8 and Fig. 9 are flowcharts showing an example of the operation of the cluster in the second embodiment. Furthermore, in the first embodiment, when the feature quantity set is created, the feature quantity set cooperation unit 1 calculates the discrimination reference 値λ' for each of the clusters 'the majority of the feature quantity sets that are the combination of the feature amounts. The set of feature quantities of the majority of the determined criterion 値λ is set as the feature quantity set of each cluster. Further, in the second embodiment, the feature amount set cooperation unit 1 sets the maximum of the feature amount sets corresponding to the combination number of the feature amounts for each cluster or one or a plurality of combinations or all other clusters of the other clusters.値-23- 200818060, the majority discriminant reference 値λ is obtained, and a set of most feature quantities for separating from other clusters is set for each cluster. Then, the feature quantity set cooperation unit 1 calculates the distance calculation data for each feature quantity set, and corresponds to the identification information of the cluster, so that the feature quantity of the majority feature quantity and the distance calculation data of each feature quantity set are memorized in the feature quantity set memory. Department 4. Then, in the seventh drawing, when the learning material is input, the feature amount extracting unit 2 reads out the most feature amount sets corresponding to each cluster from the feature amount set storage unit 4 by the identification signals of the clusters. Then, the feature quantity extracting unit 2 is a category corresponding to the feature quantity in each of the read feature quantity sets, and the feature quantity is extracted from the learning material to each cluster so that each piece of identification information corresponding to the cluster is set for each feature quantity. The extracted feature amount is memorized in the internal memory portion (step S2 1). Next, the distance calculating unit 3 is the self-featured set storage unit 4, and reads out the average 値avg of each feature amount set corresponding to the feature amount.  (i) and standard deviation std.  (i) By performing the calculation of the above formula (2), the feature quantities extracted from the self-learning materials are normalized, and the feature quantities stored in the internal memory are replaced with the normalized feature quantities. Then, the distance calculating unit 3 generates the rank V composed of the elements of V(i) obtained as described above, calculates the transposed row VT of the type V, and sequentially calculates the learning material and each by the equation (3). The Mahalanobis distance between the clusters corresponds to the identification information of each cluster, and each feature quantity set is set to the internal I memory portion (step S22). Next, the distance calculating unit 3 calculates the Mahalano ratio -24-200818060 s distance from the calculation result, and multiplies the correction coefficient λ^ι/2 corresponding to the feature quantity set to obtain a correction coefficient, each of which is matched with Maha. The Ronbis distance replacement (step S23). Then, the distance calculating unit 3 arranges the correction distances between the clusters in the internal memory unit in the order of small to large (the small correction distances are arranged in the upper position), that is, the correction distance of the classification target data is small. The identification information of the clusters is arranged in the order of the upper ranks (step S24). Next, the distance calculating unit 3 detects the identification information of the clusters from the small (upper) to the nth corrected distances, and counts the number of pieces of identification information of each of the n included clusters, that is, performs voting for each cluster. Then, the distance calculating unit 3 is a pattern for detecting the number of counts of the identification information of each cluster of each learning material, and is a rule pattern common to the learning materials included in the same cluster. For example, when η is set to 1 〇, when the learning data for the cluster ' is 'detected as 5 clusters, 3 clusters, and cluster C is 2 counts, set this to Rule i. Furthermore, when it is the learning data of cluster C, when three clusters C are detected, even if cluster A is 7 and cluster B is 0, it is not necessarily common to cluster C. If the number of clusters C is 3 or more, Cluster C can also be set irrespective of the number of counts of other clusters, and this is set to rule 2. Furthermore, 'for the learning data of cluster A, cluster A is the arrangement pattern of the first and second digits from the upper position, even if the number of counts of the cluster β is eight, the cluster can be set irrespective of the number of counts of other clusters. A, this is set to rule 3. As described above, the regularity of the number of counts of each cluster which is classified into the same cluster is detected, and is stored as a pattern table in the identification information of each cluster -25-200818060. Here, even if one rule is set in each cluster, most settings may be made. Further, in the above description, although the distance calculation unit 3 extracts the rule mode, the user may arbitrarily set the number of counts or the rule mode in order to change the classification accuracy of the classification to each cluster. According to different clusters, the characteristics of some cluster feature information are similar to those of other clusters. The correlation between the majority clusters is the number of counts of each cluster or the object pattern from the pattern of the upper ranks, and the classification of the data of the classified objects. In the case where the accuracy of one side is relatively high, this embodiment complements this point. Next, the processing of cluster analysis using the second embodiment of the rules described in the above table will be described using the flowchart of Fig. 8. When the classification target data is input, the feature amount extracting unit 2 reads out the majority of the feature amount sets corresponding to each cluster from the feature amount set storage unit 4 by the identification signals of the clusters. Then, the feature quantity extracting unit 2 is a category corresponding to the feature quantity in each of the read feature quantity sets, and extracts the feature quantity from the classification target data to each cluster so that each piece of identification information corresponding to the cluster is used for each feature quantity. The set memorizes the extracted feature amount in the internal memory (step S3 1). Next, the distance calculating unit 3 is the self-featured set storage unit 4, and reads out the average 値avg·(i) and the standard deviation std of each feature amount set corresponding to the feature amount. (i) By performing the above-described equation (2), the feature quantities extracted from the classification target data are normalized, and the feature amount stored in the internal memory portion is replaced with the normalized feature amount. -26-200818060 Then, the distance calculating unit 3 generates the rank V composed of the elements of V(i) obtained as described above, and calculates the transposed row VT of the type V, which is calculated by the equation (3). The Mahalanobis distance between the learning material and each cluster corresponds to the identification information of each cluster, and is stored in the internal memory portion for each feature amount set (step S32). Next, the distance calculating unit 3 calculates the Mahalanobis distance of the calculation result, multiplies the correction coefficient λ·(1/2) corresponding to the feature amount set, and obtains the correction coefficient, which is different from the Mahalano ratio. The distance is replaced (step S3 3 ). Then, the distance calculating unit 3 arranges the correction distances between the clusters in the internal memory unit in the order of small to large, that is, the identification information in which the clustering information of the classification target data is small is arranged in the order of the upper level ( Step S 3 4 ). After the arrangement, the distance calculating unit 3 detects the identification information of the clusters of the correction distances from the small (upper) to the nth, and counts the number of pieces of identification information of each of the n clusters, that is, performs voting on each cluster. . Next, the distance calculating unit 3 executes a matching process of whether or not the mode (or the mode of arrangement) of the number of the upper bits of each of the classification target data with respect to the number of counts of each cluster exists in the table stored therein (step S35). When the distance calculation unit 3 detects the result of the comparison and the rule pattern matching the target pattern of the classification target data is described in the table, the distance calculation unit 3 determines that the classification target data belongs to the cluster of identification information corresponding to the matching rule. The classification object data is classified in the cluster (step S36). Further, the processing of using the other clusters of the second embodiment described in the above table will be described using the flowchart of Fig. 9. -27- 200818060 In the other cluster processing shown in FIG. 9, the processing of step S31 to step S35 is the same as the processing shown in Fig. 8, and the distance calculating unit 3 is as described in step S35. In general, the rule mode of self-memory in the table is executed, and the object mode of the classified object data is processed. Then, the distance calculation unit 3 detects whether or not the rule pattern matching the target pattern is searched for, and when detecting the rule pattern in which the matching is found, the process proceeds to step S47, and when When it is detected that the rule pattern is not searched for, the process proceeds to step S48 (step S46). When it is detected that the matching rule pattern is found, the distance calculating unit 3 determines that the classification target material belongs to the cluster of identification information corresponding to the matching rule, and classifies the classification target data into the cluster, and corresponds to the cluster database 5' The identification information of the clusters of the classified points is stored, and the classified object data classified is sorted (step S47). Further, when it is detected that there is no matching rule pattern, the distance calculating unit 3 detects the number of counts, that is, the identification information having the largest number of votes, and classifies the classified object data corresponding to the cluster of the identification information. Then, the distance from the five tenth parts is the cluster information database 5, and the classified object data is sorted corresponding to the identification information of the clusters of the home locations (step S 4 8 ). [Third Embodiment] The second embodiment is a table in which the distance from each cluster of the calculated classification target data is small (similarity is large) to the upper η -28-200818060. It is explained whether or not the clustering process of the respective classification target data is executed in accordance with the rule pattern present in the table. However, even if the feature amount set is set in each cluster as in the third embodiment, the operation corresponds to each feature amount set. The Mahalanobis distance is calculated as a correction distance, and a cluster of correction distances within a specific order of the upper rank is used as a cluster to which the classification target data belongs. In the following, the configuration of the third embodiment is the same as the first and second embodiments shown in Fig. 1, and the same reference numerals are given to the respective configurations. Only the operation different from the second embodiment will be described using the first embodiment. In the third embodiment, the processing of setting the above rule is not performed on the self-learning material, but the step S48 in Fig. 9 is directly executed. Fig. 10 is a flow chart showing an example of the operation of the cluster in the third embodiment. In the other cluster processing shown in FIG. 10, the processing from step S31 to step S34 is the same as the processing shown in Fig. 8, and the distance calculating unit 3, as described earlier, in step S34, In the order of small to large, the correction distance between the clusters in the internal memory unit is arranged, that is, the cluster information in which the correction distance of the classification target data is small is the upper order (step S34). Next, the distance calculating unit 3 detects the identification information of the cluster corresponding to each of the corrected distances from the small (upper) to the nth, and counts the number of pieces of identification information of each of the n included clusters, that is, for each cluster. The voting process is performed (step S55). Then, the distance calculating unit 3 detects the identification information of the maximum count 値 (the number of votes) in the voting result, and uses the cluster corresponding to the identification information, -29-200818060 as the cluster of the classified object data, and the cluster data. The library 5' corresponds to the identification information of the cluster of the home location, and the classified object data of the memory is memorized (step S56). Furthermore, the distance calculation unit 3 sets a threshold for the number of votes that the user has pre-screened for each piece of identification information. When the number of votes of the identification information with the highest number of votes is less than the threshold, even if it is executed, it does not belong to any The processing of clusters is also possible. For example, when the three clusters of clusters A, B, and C are classified, the number of votes relative to the identification information of cluster A is five, and the number of votes for the identification information of cluster B is three. When the number of votes with respect to the cluster C is two, the identification information of the maximum number of votes detects the cluster A and the distance calculating unit 3. However, when the above-mentioned threshold 设定6 is set for the cluster A, the distance calculation unit 3 is not satisfied with any cluster due to the number of votes of the identification information with respect to the cluster A being less than the critical number. Accordingly, in the cluster analysis of clusters in which the feature quantity is only slightly different from other clusters, the reliability of the classification processing can be improved for the cluster of the classified object data. [Method of Transforming Feature Quantity] Although it is expected that the parent group of each feature quantity performs a cluster analysis as a normal distribution, the presence or absence of a normal distribution 'the parent group has a skew distribution depending on the type (area, length, etc.) of the feature quantity, It is possible to calculate the distance between the classified object data and each cluster, that is, the accuracy of the classification object data and the similarity of each cluster -30-200818060. Therefore, depending on the feature quantity, some feature quantities need to be performed by a specific method to change the feature quantity of the parent group, so that the accuracy of the similarity is improved close to the normal distribution. As a transformation method for transforming to the normal distribution, an operation for calculating a feature quantity by calculating a η square root or a factorial or a number 値 by log or square root (/), cube root (3/), or the like Change in any of the formulas. Hereinafter, a method of converting each feature amount will be described using FIG. Fig. 1 is a flow chart showing an operation example of setting processing of each feature amount conversion method. Further, the conversion method is set for each cluster in units of feature numbers included in the cluster. Furthermore, the setting of the conversion method is performed using learning materials belonging to each cluster. The following processing will be described with the feature amount set cooperation unit 1 being executed, but other processing units corresponding to the processing may be provided. The feature amount set cooperation unit 1 uses the identification information of the cluster of classification objects as a key, reads the learning data contained in the cluster from the cluster database, and calculates (normalizes) the feature amount of each learning material (step S6 1 ) . Then, the feature amount set cooperation unit 1 performs the change of the feature amount by transporting the read learning materials by using any one of the arithmetic expressions for performing the feature amount conversion stored therein (step S62). When the conversion of the feature amounts of all the learning materials is completed, the feature amount set creating unit 1 calculates an evaluation 表示 indicating whether or not the distribution obtained by the conversion processing is close to the normal distribution (step S63). -31 - 200818060 Next, the feature quantity set cooperation unit 1 detects all the arithmetic expressions that are set in advance, that is, which are previously used as the conversion method, and whether or not the evaluation 算出 is calculated, and it is detected that the conversion characteristics are calculated in all the arithmetic expressions. When the evaluation of the acquired distribution is performed, the process proceeds to step S65, and when it is detected that all the calculation formulas have not yet been calculated, the processing of the next set expression is executed, so that the process is returned. Go to step S62 (step S64) ° When the feature amount conversion of all the arithmetic expressions is ended, the feature quantity set cooperation unit 1 detects that the distribution 値 is the smallest distribution among the distributions obtained in the set expression, that is, the most The conversion method for calculating the expression of the detected distribution is determined close to the distribution of the normal distribution, and the method for converting the feature amount of the cluster is internally set (step S65). The feature amount set cooperation unit 1 performs the above processing for each feature amount of each cluster, and sets a conversion method corresponding to each feature amount of each cluster. Next, the calculation of the evaluation 上述 in the above step S63 will be described using Fig. 12 . Fig. 1 is a flowchart showing an example of the operation of the process of evaluating the distribution of the distribution obtained by the arithmetic expression. The feature amount set cooperation unit 1 converts the feature amounts of the respective learning materials belonging to the object cluster by the set arithmetic expression (step S7 1 ). After the feature quantity of all the learning materials is converted, the feature quantity set cooperation unit 1 calculates the average 値μ and the standard deviation σ of the distribution (parent group) obtained by the converted feature quantity (step S72). Then, the feature quantity set cooperation unit 1 calculates the ζ値(1) from (x-μ) / σ using the average 値μ and the standard deviation σ of the above parent group (step -32 - 200818060 S73) 〇 then the 'feature quantity set The preparation unit 1 calculates the cumulative accuracy rate in the parent group (step S74). After the calculation, the feature quantity set cooperation unit 1 calculates the inverse function of the cumulative distribution function of the standard normal distribution by the cumulative accuracy in the parent group as ζ値(2) (step S75). Then the 'feature quantity set cooperation part 1 is to find the two ζ値 of the distribution of the feature quantity, that is, the difference between ζ値(1) and ζ値(2), that is, the error of the two ζ値 in the distribution (step S76). When the error of ζ値 is obtained, the feature quantity set cooperation unit 1 calculates the sum of the errors of the above two ζ値, that is, the sum of the errors (self-multiplication) as the evaluation 値 (step S77). The smaller the error of the above two flaws, the closer the distribution is to the normal distribution. If there is no flaw, the normal distribution. In addition, the more the distribution is out of the regular distribution, the larger the error. Next, before performing the cluster processing in the first to third embodiments, the feature amount of the classification target data will be described using FIG. Fig. 1 is a flow chart showing an example of the operation of calculating the feature quantity data of the classification target data. The distance calculating unit 3 extracts the feature amount of the recognition target from the input classification target data in accordance with the feature amount set set for each cluster, and performs all the normalization processing (step S 8 1 ). Next, the distance calculation unit 3 converts the feature amount used for classification into the cluster of the classification target by the conversion method (calculation formula) set for the feature amount of the cluster (step S82). - 200818060 Then, the distance calculation unit 3 calculates the distance of the cluster to be classified as described in the first to third embodiments (step S83). Next, the distance calculation unit 3 converts the feature quantity to the cluster of the classification target by the conversion method set corresponding to the feature quantity of each cluster, and performs the detection of whether or not to calculate the distance of the cluster by the feature quantity of the transformation. When the distance is obtained for all the clusters of the classification target, the process proceeds to step S85, and when the cluster in which the classification object remains is detected, the process returns to step S82 (step S84). Then, in each of the modes of the brothers 1 to 3, the processing from the point at which the calculation of the distance ends is started (step S 8 5 ). For the above reason, when the distance between the classification target data and each cluster is obtained in the Mahalanobis distance used in the present embodiment, since the expected feature amount is a normal distribution, each of the parent groups The closer the distribution of the feature quantities is to the normal distribution, the better the distance (similarity) between the clusters can be obtained, and the accuracy of classifying the clusters can be expected. EXAMPLES [Calculation Example] Next, using the cluster system of the first, second, and third embodiments, the classification accuracy of the conventional example was confirmed by the sample data shown in Fig. 14. It can be seen that although the number of samples is small, the accuracy of the conventional example or the above is obtained regardless of the amount of features used. In the first picture, 10 learning materials are defined as the types 1, 2, and 3 of each cluster, and each learning material has the feature quantities a, b, c, d, e, f, g, 8 of h. -34- 200818060 In this example, the learning material belonging to each cluster shown in Fig. 14 determines the feature quantity set used by the cluster, and then the learning set is also used as the classification object data to perform cluster analysis. The calculation result is that the first calculation method is the previous calculation method, and the characteristic quantities a and g which are the combination of the feature quantities are used, and the learning materials shown in the 14th picture of the cluster 1 to the cluster 3 are operated, and the Mahalano ratio is calculated. The distance is expressed as the judgment result. In Figure 15 (a), Cluster is listed as the Mahalanobis distance from Cluster 1, Cluster 2 is listed as the Mahalanobis distance from Cluster 2, and Cluster 3 is listed as the Horse with Cluster 3. Harlanobis distance. Further, the 'category list actually indicates the cluster to which each learning material belongs, and the judgment result indicates that the learning material and the Mahalanobis distance are the smallest clusters. The number indicating the correct classification type and the judgment result is the same characteristic quantity data. In Fig. 15(b), the number of the column indicates the cluster to which the learning material timepiece belongs, and indicates the number of the decision line. For example, "8" of the flag R1 is determined to be the cluster 1 of the clusters of 1 cluster, and the "2" of the marker R2 is the cluster 2 of the 10 clusters of the cluster 1. P0 is a ratio indicating the agreement between the positive solution and the answer, pl is the probability of the coincidence of the two, and k is the total correction rate, which is obtained by the following equation. Indicates that the higher the k, the higher the classification accuracy. k = (p0-p 1)/(1 -p 1) p0 = (a + d)/(a + b + c + d) pl = [(a + b) · (a + c) · (b + d) · (c + d)] · (a + b + c + d) 2 The relationship between a, b, c, and d in the above formula will be described using Fig. 16. The data belonging to cluster 1 is classified as cluster 1 and the number is a, and belongs to -35- 200818060. The data in cluster 1 is treated as cluster 2 and the number is b' a + b is the number of data belonging to cluster 1. Furthermore, the data belonging to cluster 2 is treated as cluster 2, the number is d, and the data belonging to cluster 2 is treated as cluster 1 and the number is c, indicating that c + d belongs to cluster 2. a + c is the number of clusters 1 that are classified into the full data a + b + c + d ' b + d is classified into the number of clusters b in the full data a + b + c + d. Next, in the seventh embodiment, using the calculation method of the first embodiment, the Mahalanobis distance is calculated for each learning material shown in Fig. 14 of the cluster 1 to the cluster 3, and the determination result is shown. The drawings (a) and (b) of the seventh embodiment are the same as those of the fifth embodiment, and thus the description thereof is omitted. It can be seen that the correct rate p0, the coincidence accuracy rate P 1 , and the total correction rate k are equal to the calculation method of the fifth figure. Here, a method of selecting a combination of the largest discrimination criterion 値λ for each cluster from the above-described combination is used, and a feature amount set corresponding to each cluster is calculated. The feature quantity set corresponding to the cluster 1 is a combination of the feature quantities a and h, and the feature quantity set corresponding to the cluster 2 is the feature quantity a and d, and the feature quantity set corresponding to the cluster 3 is A combination of feature quantities a, g is used. Next, in the eighth embodiment, using the calculation method of the second embodiment, the Mahalanobis distance is calculated for each learning material shown in Fig. 14 of the cluster 1 to the cluster 3, and the determination result is shown. The method for viewing the first (8) and (b) of FIG. 8 is the same as that of Fig. 15, and therefore the description thereof is omitted. The correct rate p 〇 is 0. 8 3 3 3, the probability of coincidence is p 1 is 〇 · 3 3 3 3, and the total correction rate k is 0. 75, it can be seen that the classification accuracy is improved compared to the conventional calculation method of Fig. 15. Here, from the combination of the above, the method of determining the feature amount for each cluster is selected by the method of determining the discrimination criterion 値λ to the upper third number in each -36-200818060 cluster. Three combinations of feature quantities a, h, a, g, d, and e are used as the feature quantity set corresponding to cluster 1, and three combinations of feature quantities a, f, a, d, a, b are used. As the feature quantity set corresponding to the cluster 2, three combinations of the feature quantities e, g, a, c, a, and g are used as the feature quantities corresponding to the cluster 3. Furthermore, the determination of the vote is arranged in a sequence in which the distance of the Mahalanobis is small, and the number of clusters entering from the few to the third is calculated, and the cluster of the largest number is regarded as the cluster to which the classification object data belongs. Next, in the nineteenth embodiment, using the calculation method of the second embodiment, the Mahalanobis distance is calculated for each learning material shown in Fig. 14 of the cluster 1 to the cluster 3, and the Mahalano is calculated. After the Bisz distance is multiplied by the correction coefficient (λ)_1/2, the rank of the distance is arranged to indicate the result of the determination. The method for viewing the seventeenth (a) and (b) of the present invention is the same as that of the fifteenth figure, and therefore the description is omitted. The correct rate p〇 is 0. 8 3 3 3, chance coincides that the pi is 0. 3 3 3 3, the total correction rate k is 0·75, and it can be seen that the classification accuracy is improved compared with the conventional calculation method of Fig. 15. Here, the method of selecting the feature quantity set for each cluster is selected by the method of selecting the discrimination criterion 値λ from the upper third number in each cluster. Three combinations of the feature quantities a, h, a, g, d, and e are used as the feature quantity set corresponding to the cluster 1, and three combinations of the feature quantities a, f, a, d, a, and b are used. As a set of feature quantities corresponding to cluster 2, three combinations of feature quantities e, g, a, c, a, g are used as feature quantities corresponding to cluster 3 - 37 - 200818060 Arranged in a sequence with a small distance from the Mahalanobis, the number of clusters entering from the few to the third is calculated, and the cluster of the largest number is regarded as the cluster to which the classification object data belongs. According to the results of the classifications shown in the above-mentioned fifth, seventh, eighth, and ninth aspects, it is judged that the present embodiment performs high-speed and high-precision cluster processing as compared with the conventional example, and the present embodiment is confirmed to be superior to the conventional example. Sex. [Application Example of the Present Invention] A. Inspection device Description The inspection device (for example, the defect inspection device) for classifying the inspection object, for example, the scratch type on the surface of the glass substrate, as shown in Fig. 20 is shown. Fig. 2 is a flowchart for explaining an operation example of selection of a feature amount set, and Fig. 2 is a flowchart for explaining an operation example in cluster processing. First, the selection action of the feature amount set will be described. The collection of learning materials in step S1 in the flowchart of Fig. 5 is a step S1 0 1 to step S1 0 5 corresponding to the flowchart of Fig. 21. Since steps S2 to S4 of Fig. 21 are the same as those of the flowchart of Fig. 5, description thereof will be omitted. By the operation of the operator, samples for learning materials corresponding to the clusters of the types of scratches to be classified are collected (step S 1 0 1 ) °. The illumination device 1 照射 2 illuminates the image acquisition unit 1 0 1 as learning materials. The shape of the scratches collected is obtained by the photographing device 103 from the image data of the scratched portion (step S102). Then, the feature data acquired by the self-image acquisition unit 1 0 1 calculates the feature amount of the scratch of the learning material (step S丨03}. The feature quantities of the acquired learning materials are classified into visuals. The obtained classification site 'executes the specifics of the learning materials in each cluster (step S 1 04 ) ° Then, the learning data of each cluster becomes a specific number (the number of samples set in advance) 'for example, each is about 30, and the steps are repeated. The process from si〇i to step S1 0 2 'when the number is a specific number, the cluster unit 丨〇2 executes all the processes after step S2 described in the fifth figure. Here, the clustering unit 1 〇5 is the first 1 or the cluster system in the second embodiment. Next, referring to Fig. 22, the cluster processing in the inspection apparatus of Fig. 4 will be described. Here, step S31 to step S34, S55, and S56 and Fig. 22 of Fig. 22 are described. The flowchart of Fig. 10 is the same, and the description is omitted. In the inspection apparatus of Fig. 20, when the inspection is started, illumination is performed on the illumination device 1 〇 2 for the glass substrate of the inspection object 100, and the photographing apparatus 1 〇 3 Photographing the surface of the glass substrate and taking the photo The image is output to the image acquisition unit 1 〇 1. In this manner, the defect candidate detection unit 104 detects a portion different from the plane shape in the photographic image input by the self-image acquisition unit, and sets the defect candidate to be classified ( Step S201) Next, the curve candidate detecting unit 104 cuts the image data of the candidate candidate portion from the captured image as the classification target data. Then, the defect candidate detecting unit 104 is the image data of the self-classifying target data. The feature amount is calculated, and the clustering unit 105 outputs the classification target data composed of the extracted feature amount sets (step S2 02). The subsequent cluster processing is explained in the step of the first map -39-200818060 Therefore, as described above, the inspection apparatus of the present invention can classify the scratches existing on the glass substrate into each type B of the scratches. Defect type judging device The defect type judging device shown in Fig. 23 is clustered in the cluster system of the present invention which has been described. The image capturing device 201 is composed of the illuminating device 102 and the photographic device 103 in Fig. 20 . The learned data of the place where the classification target data is classified is obtained, and is prepared in the cluster database of the cluster device 105. The feature amount set in Fig. 5 is also ended. The imaging device of the image capturing device mounted on each manufacturing device detects the defect candidate, and extracts the image data amount and outputs it to the data collecting device 203. After the classification data of the data collection device 203 of the control device 200 is transferred to the cluster, as described above, the cluster portion 105 is the classification target data input corresponding to the scratch cluster classification. C. Manufacturing management device The manufacturing management device of the present invention is set as shown in Fig. 24, manufacturing device 3 () 1, 3 〇 2, notification unit 303, poor condition device determining unit 305, and defect type determining device. . Here, the curve type determining means 3 〇 6 is the same as the above-described B item type determining means. Classes are made with high precision. The shape 105 corresponds to: the part 2 0 1 , the learning of each cluster 5 . Therefore, 202 is input and the extracted feature is input to the portion 1 〇 5. In the case of the type, the defect type determining device 306 described in the control recording unit 3 04, 306 is configured as the defect candidate determining device 306 in the corresponding defect candidate 104, and is provided in each of the manufacturing devices 3〇1. The photographing images of the image capturing devices 201 and 202 of the manufacturing apparatus extract the feature amount from the image and perform classification of the classified object data. Next, the poor device determining unit 305 has identification information indicating the diversity, and identification information indicating a cluster of points input from the defect type determining means 306 from the table in accordance with the relationship of the cluster occurrence factors. The cause of the occurrence is determined to be the device to be manufactured. In other words, the poor condition device determining unit 305 is a cause for detecting a defect in the manufacturing process of the product corresponding to the identification information, and then the device undone device determining unit 306 is operated by the notifying unit 303 and is in the recording unit. 3 04 corresponds to the date of the determination, the identification number of the defect classification, the cause of the occurrence, and the manufacturing identification information as the history. Further, the control device 300 is a manufacturing device that stops the determination by the good device determining unit 305, and controls the control parameters D. Manufacturing Management Apparatus The other manufacturing management apparatus of the present invention is constituted by the control unit 300, the manufacturing apparatuses 301 and 302, the notifying sections 303, 308, and the clustering unit 105 as shown in Fig. 25. Here, the clustering unit 1〇5 is the same as the configuration described in the :B term. The clustering unit 105 is different from the above-described cases of A to C, and the characteristic data of the image data is made by a manufacturing process of, for example, a glass substrate (material component, processing temperature, pressure, processing speed detecting unit 3 02, The table of the cluster, the classification of the classification is not the same as the state of the memory device, and the feature quantity of the above-mentioned A classification system, such as the recording system, is -41 - 200818060, according to the process The manufacturing state of each project describes that the feature amount is input to the clustering unit 1 1 5 by the engineering information set in each manufacturing device 301 or 302, and the clustering unit 105 is the item by the above classification. The glass manufacturing process in each of the manufacturing facilities is classified into a "normal state", "a defect that needs to be adjusted, and a state in which a danger is required to be adjusted". Then, 105, the notification unit 303 notifies the classification result to the identification information of the cluster of the classification result, and outputs the identification information of the clustering result to the control device, so that when the recording unit 309 corresponds to the date determined, the classification of the manufacturing state of each project is memorized. Identification number; the most versatile manufacturing condition; and the identification information of the manufacturing device. The control device 300 has a table indicating an adjustment item for returning the identification information of the cluster to the normal and an object of the data, and the cluster identification information input by the read cluster unit 105, the adjustment item of the manufacturing strip and the data, by the Read data control device. Further, even if the cluster system for realizing the recording in Fig. 1 is recorded on a computer-readable recording medium, the computer is attached to the recording medium, and the sorting and pair processing can be performed by performing the processing. Further, the term "computer system" as used herein includes hardware such as a machine. Furthermore, the "computer system" also includes a WWW system with an environment (or display environment). Further, the recording medium to be read refers to a floppy disk, a magnetic disk, a ROM, and a classification. The characteristic quantity of the sensor on it. After the feature quantity of the material is manufactured, the state of the whole state, the clustering component, and 300, and the characteristics and manufacturing conditions of the problem above the resume are corresponding to the function of the manufacturing system that returns to the normal correspondence. In the cluster OS of the recorded data or the surrounding area, there is a web page, "Computer can be CD-ROM, etc. -42-200818060, removable media, hard disk built in computer system, etc." The recording medium is a volatile memory (RAM) inside the computer system of the server or the client when the program is transmitted via a communication line such as a network or a telephone line via the Internet, and the program is kept for a certain period of time. Furthermore, the above program can be transmitted to another computer system via a transmission medium or a transmission wave in the transmission medium even if it is stored in a computer system such as a memory device. Here, the "transmission medium" of the transmission program is a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. Furthermore, the above procedure may be used to implement one of the above functions. Further, even if it is possible to realize the above function by a combination with a program already recorded in a computer system, it may be a differential file (differential program). [Industrial Applicability] The present invention is applicable to high-precision classification and discrimination of information having a plurality of types of features, such as detection of defects such as glass articles, and can also be utilized for manufacturing state detecting devices or product manufacturing management. Device. [Brief Description of the Drawings] Fig. 1 is a block diagram showing a configuration example of a cluster system according to the first and second embodiments of the present invention. Fig. 2 is a table for explaining processing for selecting a feature set by discriminating the reference 値r. -43- 200818060 Fig. 3 is a table for explaining the process of selecting a feature set by discriminating the reference 値r. Fig. 4 is a histogram illustrating the effect of selecting a feature set by discriminating the reference 値r. Fig. 5 is a flowchart showing an operation example of a process of selecting a feature amount set for each cluster by the first embodiment. Fig. 6 is a flowchart showing an example of an operation of performing cluster processing on the classification target data by the first embodiment. Fig. 7 is a view showing an operation example of the table for generating the rule pattern used in the clustering processing in the second embodiment. Fig. 8 is a view showing an operation example of performing cluster processing on the classification target data by the second embodiment. Fig. 9 is a view showing an operation example of performing other cluster processing on the classification target data by the second embodiment. Fig. 10 is a flowchart showing an operation example of performing cluster processing on the classification target data by the third embodiment. Fig. 11 is a flow chart showing an example of the operation of the arithmetic expression for setting the method of converting the feature amount. Fig. 1 is a flow chart showing an example of the operation of calculating the evaluation 流程图 in the flowchart of Fig. 1 1 . The first 130 is a flowchart showing an example of an operation for calculating the distance of the feature amount converted by the set conversion method. Figure 14 is a table showing learning materials belonging to each cluster. Fig. 15 is a table showing the results of classifying the learning materials of Fig. 14 by the method of the conventional example -44-200818060. Fig. 16 is a view showing a method of calculating the total correction determination rate. Fig. 17 is a result table showing the results of the learning data of Fig. 14 by the cluster system in the first embodiment. Fig. 18 is a table showing the results of the learning materials classified by the cluster system in the second embodiment. Fig. 19 is a table showing the results of the learning materials classified by the clustering system in the second embodiment. Fig. 20 is a block diagram showing an example of the configuration of an inspection apparatus using the cluster system of the present invention. Fig. 21 is a flow chart showing an example of the selection operation of the feature amount set in the inspection apparatus of Fig. 20. Fig. 22 is a flow chart showing an example of the operation of the clustering processing in the inspection apparatus of Fig. 20. Figure 23 is a block diagram showing the constitution of a defect type determining device using the cluster system of the present invention. Fig. 24 is a block diagram showing a configuration example of a manufacturing management apparatus using the cluster system of the present invention. Fig. 25 is a block diagram showing an example of the configuration of another manufacturing management apparatus using the cluster system of the present invention. [Description of main component symbols] 1 : Feature amount set cooperation unit -45- 200818060 2 : Feature amount extracting unit 3: Distance calculating unit 4: Feature amount set memory unit 5: Cluster database 1 〇〇: Inspected object 1 〇1 : Image acquisition unit 102 : illumination device 1 0 3 . Imaging device 104: defect candidate detecting unit 105: clustering unit 2 0 0, 3 0 0 : control device 201, 202: image obtaining device 301, 3 02 : manufacturing device 3 0 3 : notifying unit 3 04 : recording unit -46-

Claims (1)

200818060 十、申請專利範圍 1 · 一種叢集系統,屬於將輸入資料藉由該輸入資料所 具有之特徵量,分類成藉由學習資料之母集團所形成之各 個叢集的叢集系統,其特徵爲: 具有: 特徵量集合記憶部,對應於上述各個叢集,記憶有分 類所使用之特徵量之組合的特徵量集合; 特徵量抽出部,自輸入資料抽出事先所設定之特徵量 5 距離計算部,在對應於各叢集之每個特徵量集合,根 據該特徵量集合所含之特徵量,計算出各叢集之母集團之 中心和上述輸入資料之距離,以當作各個集合距離而予以 輸出;和 順位抽出部,由小至大之順位配列上述各集合距離。 2 ·如申請專利範圍第1項所記載之叢集系統,其中, 上述特徵量集合在每個叢集設定有多個。 3 ·如申請專利範圍第2項所記載之叢集系統,其中, 又具有叢集分類部,藉由表示根據在每個特徵量集合取得 之上述集合距離中的該集合距離的順位設定之輸入資料分 類到各個叢集之分類基準的規則模式,檢測出上述輸入資 料屬於哪一個叢集。 4.如申請專利範圍第3項所記載之叢集系統,其中, 上述叢集分類部藉由上述集合距離之順位,檢測出上述輸 入資料屬於哪一個叢集,並將該順位成爲上位之集合距離 -47- 200818060 爲多之叢集’當作上述輸入資料所屬之叢集而予以檢測出 〇 5 ·如申請專利範圍第4項所記載之叢集系統,其中, 上述叢集分類部具有相對於順位成爲上位之數量的臨界値 ,成爲上位之叢集若爲該臨界値以上時,則當作輸入資料 所屬之叢集而予以檢測出。 6·如申請專利範圍第1至5項中之任一項所記載之叢 集系統,其中,上述距離計算部相對於上述集合距離,乘 算對應於特徵量而所設定之補正係數,使各特徵量集合間 之集合距離予以標準化。 7.如申請專利範圍第1至5項中之任一項所記載之叢 集系統,其中,又具有作成每個叢集之特徵量集合之特徵 量集合作成部, 上述特徵量集合作成部是針對各特徵量之多數組合中 之每個組合,將各叢集之母集團之學習資料之平均値當作 原點,求出該原點和其他叢集之母集團之各學習資料之距 離之平均値,並將成爲最大平均値之特徵量之組合,選擇 爲用於將各叢集從其他叢集辨別出之特徵量集合。 8·—種缺陷種類判定裝置,其特徵爲: 設置如上述申請專利範圍第1至7項中之任一項所記 載之叢集系統, 上述輸入資料爲製品缺陷之畫像資料’藉由表示缺陷 之特徵量,將畫像資料中之缺陷按照缺陷之種類進行分類 -48- 200818060 9 ·如申請專利範圍第8項所記載之缺陷種類判定裝置 ,其中’上述製品爲玻璃物品,將該玻璃物品之缺陷按照 缺陷之種類進行分類。 I 〇 · —種缺陷檢測裝置,其特徵爲:設置有如申請專 利範圍第8或9項所記載之缺陷種類判定裝置,用以檢測 製品之缺陷的類別。 II · 一種製造狀態判定裝置,其特徵爲:設置有如申 請專利範圍第8或9項所記載之缺陷種類判定裝置,執行 製品之缺陷的分類,根據與對應於該類別之發生要因的對 應性,檢測出製程中缺陷之發生要因。 1 2 . —種製造狀態判定裝置,其特徵爲:設置有如申 請專利範圍第1至7項中之任一項所記載之叢集系統,上 述輸入資料爲表示製品之製程中之製造條件的特徵量,按 照製程之各工程的製造狀態對該特徵量進行分類。 1 3 ·如申請專利範圍第1 2項所記載之製造狀態判定裝 置,其中,上述製品爲玻璃物品,將該玻璃物品之製程中 之特徵量,按照製程之各工程的製造狀態進行分類。 1 4. 一種製造狀態檢測裝置,其特徵爲:設置有如申 請專利範圍第1 2或1 3項所記載之製造狀態判定裝置,檢 測出製品製程之各工程中的製造狀態之類別。 1 5 . —種製品製造管理裝置’其特徵爲:設置有如申 請專利範圍第1 2或1 3項所記載之製造狀態判定裝置,執 行檢測出製品製程之各工程中的製造狀態之類別’根據對 應於該類別之控制項目,製程工程中之製程控制。 -49-200818060 X. Patent application scope 1 · A cluster system belongs to a cluster system that classifies input data into various clusters formed by a parent group of learning materials by using the feature quantity of the input data, and has the following features: The feature amount set storage unit stores a feature amount set in which a combination of feature amounts used for classification is stored in the respective clusters; and the feature amount extracting unit extracts a feature amount 5 distance calculation unit set in advance from the input data, and corresponds to For each feature quantity set of each cluster, according to the feature quantity contained in the feature quantity set, the distance between the center of the parent group of each cluster and the input data is calculated to be output as each set distance; and the order is extracted In the ministry, the distances of the above collections are arranged from small to large. The cluster system according to the first aspect of the invention, wherein the feature amount set is set in a plurality of clusters. 3. The clustering system of claim 2, further comprising a cluster classification unit that classifies the input data by setting the order of the set distances in the set distances obtained for each feature amount set. The rule pattern of the classification criteria of each cluster is used to detect which cluster the input data belongs to. 4. The clustering system according to claim 3, wherein the cluster classification unit detects, by the rank of the set distance, which cluster the input data belongs to, and sets the rank as a set distance of the upper-47 - 200818060 A cluster system of the above-mentioned input data is detected as a cluster of a plurality of clusters. The clustering system described in claim 4, wherein the cluster classification section has a higher number relative to the rank. When the cluster that becomes the upper level is above the critical threshold, it is detected as the cluster to which the input data belongs. The clustering system according to any one of claims 1 to 5, wherein the distance calculating unit multiplies the correction coefficient set corresponding to the feature amount with respect to the collective distance, so that each feature The set distance between the sets of quantities is standardized. 7. The cluster system according to any one of claims 1 to 5, further comprising a feature quantity set cooperation unit for creating a feature quantity set of each cluster, wherein the feature quantity set cooperation part is for each For each of the majority of the combinations of the feature quantities, the average of the learning materials of the parent group of each cluster is taken as the origin, and the average distance between the origins and the learning materials of the parent group of the other clusters is obtained, and The combination of the feature quantities that will become the maximum average is selected as the set of feature quantities used to distinguish each cluster from other clusters. 8. A device for determining a defect type, characterized in that: the cluster system according to any one of the above-mentioned claims, wherein the input data is an image of a defect of the product 'by indicating a defect The feature quantity, the defect in the image data is classified according to the type of the defect-48-200818060 9 - The defect type determination device described in the eighth aspect of the patent application, wherein the above product is a glass article, and the defect of the glass article Classified according to the type of defect. I 〇 — 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 缺陷 。 。 。 。 。 。 。 II. A manufacturing state judging device characterized in that a defect type judging device as set forth in claim 8 or 9 is provided, and classification of defects of the product is performed, according to correspondence with an occurrence factor corresponding to the category, The cause of the defect in the process is detected. A manufacturing state judging device characterized by being provided with a cluster system as described in any one of claims 1 to 7, wherein the input data is a feature amount indicating a manufacturing condition in a process of the product. The feature quantity is classified according to the manufacturing state of each process of the process. The manufacturing state judging device according to claim 12, wherein the product is a glass article, and the feature amount in the process of the glass article is classified according to the manufacturing state of each process of the process. 1 . A manufacturing state detecting device characterized in that the manufacturing state judging device described in the first or second aspect of the patent application is provided, and the type of manufacturing state in each of the processes of the product process is detected. A product manufacturing management device is characterized in that: a manufacturing state judging device as described in the Patent Application No. 12 or 13 is provided, and a category of a manufacturing state in each of the processes for detecting a product process is performed. Corresponding to the control items of this category, process control in process engineering. -49-
TW096124741A 2006-07-06 2007-07-06 Cluster system and defect type determination device TWI434229B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2006186628 2006-07-06

Publications (2)

Publication Number Publication Date
TW200818060A true TW200818060A (en) 2008-04-16
TWI434229B TWI434229B (en) 2014-04-11

Family

ID=38894527

Family Applications (1)

Application Number Title Priority Date Filing Date
TW096124741A TWI434229B (en) 2006-07-06 2007-07-06 Cluster system and defect type determination device

Country Status (5)

Country Link
JP (1) JP5120254B2 (en)
KR (1) KR100998456B1 (en)
CN (1) CN101484910B (en)
TW (1) TWI434229B (en)
WO (1) WO2008004559A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI476676B (en) * 2008-09-29 2015-03-11 Sandisk Il Ltd File system for storage device which uses different cluster sizes

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5163505B2 (en) * 2008-04-03 2013-03-13 新日鐵住金株式会社 Acupuncture learning device, acupuncture learning method, and computer program
JP5211810B2 (en) * 2008-04-03 2013-06-12 新日鐵住金株式会社 Acupuncture learning device, acupuncture learning method, and computer program
WO2010058759A1 (en) 2008-11-20 2010-05-27 旭硝子株式会社 Transparent body inspecting device
JP5465689B2 (en) * 2011-02-28 2014-04-09 株式会社日立製作所 High-precision similarity search system
JP5943722B2 (en) * 2012-06-08 2016-07-05 三菱重工業株式会社 Defect determination apparatus, radiation imaging system, and defect determination method
JP6359123B2 (en) * 2015-01-21 2018-07-18 三菱電機株式会社 Inspection data processing apparatus and inspection data processing method
CN107111643B (en) * 2015-01-22 2018-12-28 三菱电机株式会社 Time series data retrieves device
JP6919779B2 (en) * 2016-12-20 2021-08-18 日本電気硝子株式会社 Glass substrate manufacturing method
KR102260976B1 (en) * 2017-10-30 2021-06-04 현대모비스 주식회사 Apparatus for manufacturing object false positive rejector
CN107941812B (en) * 2017-12-20 2021-07-16 联想(北京)有限公司 Information processing method and electronic equipment
JP2019204232A (en) * 2018-05-22 2019-11-28 株式会社ジェイテクト Information processing method, information processor, and program
CN112513892B (en) * 2018-07-31 2024-06-25 三菱电机株式会社 Information processing device, computer-readable recording medium, and information processing method
CN109522931A (en) * 2018-10-18 2019-03-26 深圳市华星光电半导体显示技术有限公司 Judge the method and its system of the folded figure aggregation of defect
JP7028133B2 (en) 2018-10-23 2022-03-02 オムロン株式会社 Control system and control method
JP7270127B2 (en) * 2019-10-07 2023-05-10 パナソニックIpマネジメント株式会社 Classification system, classification method, and program
WO2021140865A1 (en) * 2020-01-08 2021-07-15 パナソニックIpマネジメント株式会社 Classification system, classification method, and program
JP6973544B2 (en) * 2020-03-31 2021-12-01 株式会社Sumco Status determination device, status determination method, and status determination program
CN111984812B (en) * 2020-08-05 2024-05-03 沈阳东软智能医疗科技研究院有限公司 Feature extraction model generation method, image retrieval method, device and equipment
CN112730427B (en) * 2020-12-22 2024-02-09 安徽康能电气有限公司 Product surface defect detection method and system based on machine vision
CN113312400B (en) * 2021-06-02 2024-01-30 蚌埠凯盛工程技术有限公司 Float glass grade judging method and device
KR102464945B1 (en) * 2021-08-18 2022-11-10 한국과학기술정보연구원 Apparatus and method for analyzing signal data state using machine learning
KR102795578B1 (en) * 2021-10-12 2025-04-15 경기대학교 산학협력단 Apparatus and method for generating image annotation based on shap
CN115687961B (en) * 2023-01-03 2023-06-27 苏芯物联技术(南京)有限公司 Automatic welding procedure intelligent recognition method based on pattern recognition

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3095623B2 (en) * 1994-06-16 2000-10-10 松下電器産業株式会社 Attribute judgment method
JP2690027B2 (en) * 1994-10-05 1997-12-10 株式会社エイ・ティ・アール音声翻訳通信研究所 Pattern recognition method and apparatus
US6307965B1 (en) * 1998-04-30 2001-10-23 International Business Machines Corporation System and method for detecting clusters of information
JP4132229B2 (en) * 1998-06-03 2008-08-13 株式会社ルネサステクノロジ Defect classification method
JP3475886B2 (en) * 1999-12-24 2003-12-10 日本電気株式会社 Pattern recognition apparatus and method, and recording medium
JP2002099916A (en) * 2000-09-25 2002-04-05 Olympus Optical Co Ltd Pattern-classifying method, its device, and computer- readable storage medium
JP2003344300A (en) * 2002-05-21 2003-12-03 Jfe Steel Kk Surface defect determination method
JP2004165216A (en) * 2002-11-08 2004-06-10 Matsushita Electric Ind Co Ltd Production management method and production management device
JP4553300B2 (en) * 2004-09-30 2010-09-29 Kddi株式会社 Content identification device
JP2006105943A (en) * 2004-10-08 2006-04-20 Omron Corp Knowledge creation device, parameter search method, and program product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI476676B (en) * 2008-09-29 2015-03-11 Sandisk Il Ltd File system for storage device which uses different cluster sizes

Also Published As

Publication number Publication date
WO2008004559A1 (en) 2008-01-10
JPWO2008004559A1 (en) 2009-12-03
CN101484910B (en) 2015-04-08
KR20090018920A (en) 2009-02-24
JP5120254B2 (en) 2013-01-16
KR100998456B1 (en) 2010-12-06
CN101484910A (en) 2009-07-15
TWI434229B (en) 2014-04-11

Similar Documents

Publication Publication Date Title
TW200818060A (en) Clustering system, and defect kind judging device
US11816579B2 (en) Method and apparatus for detecting defect pattern on wafer based on unsupervised learning
Umadevi et al. A survey on data mining classification algorithms
CN104169945B (en) To the two-stage classification of the object in image
US20040141641A1 (en) Seed image analyzer
CN106326915B (en) A Fault Diagnosis Method for Chemical Process Based on Improved Kernel Fisher
Al‐Tahhan et al. Accurate automatic detection of acute lymphatic leukemia using a refined simple classification
CN111226281B (en) Method and device for determining chromosome aneuploidy and constructing classification model
CN117152152B (en) Production management system and method for detection kit
Malkawi et al. White blood cells classification using convolutional neural network hybrid system
CN118507033B (en) Automatic interpretation and diagnosis support system for medical examination results
CN117565284A (en) Automatic control system and method for PVC film processing
Ming et al. Visual detection of sprouting in potatoes using ensemble‐based classifier
CN117523324B (en) Image processing method, image sample classification method, device and storage medium
Waegeman et al. A comparison of different ROC measures for ordinal regression
Javeed et al. Deep learning in hematology: Automated counting of blood cells using YOLOv5 object detection
CN120019419A (en) Detection of anomalies in sample images
CN112907306B (en) Customer satisfaction judging method and device
JP3865316B2 (en) Article discrimination method, article discrimination device, and program
CN115100462A (en) Socket classification method based on regression prediction
TWI884050B (en) Method and system for collecting negative samples
Singaravelan et al. Refining CBIR using rule based KNN
CN112508909A (en) Disease association method of peripheral blood cell morphology automatic detection system
CN118262181B (en) Automatic data processing system based on big data
Cervinka et al. Visual Measurement of Material Segregation in Steel Wires

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees