TWI682330B - Self-learning data classification system and method - Google Patents
Self-learning data classification system and method Download PDFInfo
- Publication number
- TWI682330B TWI682330B TW107116402A TW107116402A TWI682330B TW I682330 B TWI682330 B TW I682330B TW 107116402 A TW107116402 A TW 107116402A TW 107116402 A TW107116402 A TW 107116402A TW I682330 B TWI682330 B TW I682330B
- Authority
- TW
- Taiwan
- Prior art keywords
- data
- unit
- bases
- group
- subsystem
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/12—Classification; Matching
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本發明是關於一種資料分類系統及方法,特別是一種在資料分類過程中能夠自我提升分類能力的資料分類系統及方法。The invention relates to a data classification system and method, in particular to a data classification system and method capable of self-improving classification ability in the process of data classification.
現今電腦的普及和網路的發達,造就了各種不同類型資料的大量累積及相應資料庫的建立。在大數據的時代下,這些資料對於企業來說是寶貴的資產,也是產品開發及應用過程中不可或缺的利器。眾所周知,這些資料的建立仰賴於承載這些資料的訊號的各種處理,而訊號分類是其中最為關鍵的。藉由準確的訊號分類,可建立已知特定事件對應已知特定結果的資料,並進而利用這些資料來分析或預測待測事件的發生結果,這類的資料分類系統產品也已陸續被開發出來。舉例而言,我們可以對大量已從人體檢測得到的生理訊號進行分類,並依據分類結果來診斷人體的健康情況。Nowadays, the popularity of computers and the development of the Internet have created a large accumulation of various types of data and the establishment of corresponding databases. In the era of big data, these materials are valuable assets for enterprises and indispensable tools in the process of product development and application. As we all know, the establishment of these data depends on the various processing of the signals that carry these data, and signal classification is the most critical. By accurate signal classification, data of known specific events corresponding to known specific results can be created, and then the data can be used to analyze or predict the results of the events to be tested. Such data classification system products have also been developed . For example, we can classify a large number of physiological signals that have been obtained from human detection, and use the classification results to diagnose human health.
進行訊號分類時,所處理的資料訊號往往是高維度的訊號而有著高資料複雜度,如果直接以這些高維度的訊號進行分類,勢必增加分類系統建置上的困難度。為了降低分類系統的建置難度及成本,需要先對這些高維度的訊號進行訊號前處理而降維(dimension reduction)成低維度的訊號,之後再對這些低維度的訊號進行分類。已知的訊號前處理方式例如是二維卷積神經網路分析法、一維卷積神經網路分析法或遞迴神經網路分析法。然而,採用這些分析法時所需的資料量及花費時間相當大,對於講求效率與成果的資料分類系統而言並不划算。其他的訊號前處理方式又例如將資料訊號降維成特定基底(basis)與其對應引數(index)的組合的主成分分析法(Principal Component Analysis; PCA)。所謂的特定基底是一群與欲降維的資訊訊號有著相同資料屬性的資料訊號的共同部分,而對應引數是這些基底在欲降維的資料訊號中的成分比例。如圖7所示,藉由A、B及C三個共同基底的轉化,可將五個資料訊號分別表示成這些共同基底與三個對應引數的組合。然而,以這種方式建置的資料分類系統準確度取決於降維後的資料訊號準確度,而降維後的資料訊號準確度與基底的決定息息相關,若基底的決定僅僅是透過人為方式達成,則容易產生偏差而不易得到最佳的基底,且人為方式的決定無法標準化作業以產生量化指標,無法讓分類系統的準確度得到控制。When classifying signals, the processed data signals are often high-dimensional signals and have high data complexity. If these high-dimensional signals are directly used for classification, it is bound to increase the difficulty of the classification system. In order to reduce the difficulty and cost of building a classification system, it is necessary to perform signal pre-processing on these high-dimensional signals and dimension reduction into low-dimensional signals, and then classify these low-dimensional signals. Known signal pre-processing methods are, for example, two-dimensional convolutional neural network analysis, one-dimensional convolutional neural network analysis, or recurrent neural network analysis. However, the amount of data and time required to use these analysis methods is quite large, and it is not cost-effective for a data classification system that emphasizes efficiency and results. Other signal pre-processing methods are, for example, Principal Component Analysis (PCA), which reduces the dimensionality of the data signal to a combination of a specific basis and its corresponding index. The so-called specific basis is a common part of a group of data signals that have the same data attributes as the information signal to be reduced, and the corresponding parameter is the proportion of these bases in the data signal to be reduced. As shown in FIG. 7, by transforming the three common bases of A, B, and C, the five data signals can be expressed as a combination of these common bases and three corresponding parameters, respectively. However, the accuracy of the data classification system built in this way depends on the accuracy of the data signal after dimensionality reduction, and the accuracy of the data signal after dimensionality reduction is closely related to the decision of the base, if the decision of the base is only achieved by human means , It is easy to produce deviations and not easy to get the best base, and the decision of the human way cannot be standardized to produce quantitative indicators, and the accuracy of the classification system cannot be controlled.
因此,如何在降低分類系統的建置難度及成本的同時,降低資料訊號複雜度且提升訊號分類的準確度,確保基於此訊號分類所建置的資料分類系統產品在應用上能夠更準確地分析或預測待測事件的發生結果,是本發明欲解決的技術課題。Therefore, how to reduce the difficulty and cost of the classification system while reducing the complexity of the data signal and improving the accuracy of the signal classification to ensure that the data classification system product built based on this signal classification can be more accurately analyzed in application Or predicting the result of the event to be measured is a technical problem to be solved by the present invention.
有鑒於上述問題,本發明提供一種自學式資料分類系統及方法,其透過電腦自動學習的方式生成基底,消除了人為方式選取所可能產生的偏差,提升了訊號分類的準確度,確保基於此訊號分類方式建置的資料分類系統具有高度分類辨識率及有效的分類效果(classification result)。In view of the above problems, the present invention provides a self-learning data classification system and method, which generates a base through a computer automatic learning method, eliminates possible deviations in the selection of artificial methods, improves the accuracy of signal classification, and ensures that based on this signal The data classification system built by the classification method has a high classification recognition rate and an effective classification result (classification result).
一實施態樣中,本發明提供一種自學式資料分類系統,用於對一待測資料訊號(也可稱待分類資料訊號)進行分類。所提供的自學式資料分類系統包括一第一子系統及一第二子系統。第一子系統具有一資料庫,儲存有一群的訓練資料(training data),訓練資料的資料訊號與待測資料訊號具有相同屬性;一基底生成單元,與資料庫電性連接,用以依據訓練資料生成一第一組代表訓練資料的資料訊號的主成分(principal component)的基底,第一組基底的基底個數至少為一;及一回傳引數接收單元,與基底生成單元電性連接。第二子系統與第一子系統電性連接或網路連接,具有一資料訊號量測單元,用以量測待測資料訊號;一引數生成單元,與資料訊號量測單元電性連接,用以依據第一組基底及待測資料訊號生成對應於第一組基底的引數(index);一引數回傳單元,與引數生成單元電性連接,用以將生成的引數回傳至第一子系統;及一引數分類計算單元,與引數生成單元電性連接,用以依據生成的引數計算出待測資料訊號的分類結果。其中,回傳引數接收單元接收了引數回傳單元回傳的引數,且基底生成單元依據回傳的引數及訓練資料的資料訊號生成不同於第一組基底的一第二組基底,第二組基底的基底個數至少為一。In one embodiment, the present invention provides a self-learning data classification system for classifying a data signal to be tested (also called a data signal to be classified). The self-learning data classification system provided includes a first subsystem and a second subsystem. The first subsystem has a database that stores a group of training data (training data), the data signal of the training data has the same attributes as the signal of the data to be tested; a base generation unit, which is electrically connected to the database and used for training The data generates a first group of bases representing the principal component of the data signal of the training data. The number of bases of the first group of bases is at least one; and a return parameter receiving unit electrically connected to the base generating unit . The second subsystem is electrically connected or network-connected with the first subsystem and has a data signal measurement unit for measuring the data signal to be measured; a parameter generation unit is electrically connected to the data signal measurement unit, It is used to generate the index corresponding to the first group of bases according to the first group of bases and the data signal to be tested; an argument return unit is electrically connected to the parameter generating unit to return the generated parameters It is transmitted to the first subsystem; and a parameter classification calculation unit, which is electrically connected to the parameter generation unit, and used to calculate the classification result of the data signal to be tested according to the generated parameters. The return parameter receiving unit receives the parameters returned by the parameter return unit, and the base generation unit generates a second set of bases different from the first set of bases according to the returned parameters and the data signal of the training data The number of substrates in the second group of substrates is at least one.
一實施例中,基底生成單元具有一矩陣化處理單元,與資料庫電性連接,其將訓練資料中的至少部分資料訊號轉化成一矩陣(matrix);一特徵分解單元,與矩陣化處理單元電性連接,其對矩陣實施一奇異值分解(Singular Value Decomposition;SVD)後獲得一奇異值(singular value)及一對應的奇異向量(singular vector);一矩陣低秩近似(low rank approximation)處理單元,與特徵分解單元電性連接,其依據奇異值及奇異向量計算出矩陣的最近似低秩矩陣;及一多層感知單元,與矩陣低秩近似處理單元電性連接,其接收最近似低秩矩陣對應的資料訊號並輸出第一組基底及第二組基底其中之一。In one embodiment, the base generation unit has a matrix processing unit, electrically connected to the database, which converts at least part of the data signals in the training data into a matrix; a feature decomposition unit, which is electrically connected to the matrix processing unit Sexual connection, which implements a singular value decomposition (SVD) on the matrix to obtain a singular value and a corresponding singular vector; a low rank approximation processing unit of the matrix , Which is electrically connected to the feature decomposition unit, which calculates the most approximate low-rank matrix of the matrix based on the singular values and singular vectors; and a multi-layer sensing unit, which is electrically connected to the matrix low-rank approximate processing unit, which receives the most approximate low rank The data signal corresponding to the matrix outputs one of the first group of substrates and the second group of substrates.
一實施例中,第一子系統更具有一回傳引數評價單元,與基底生成單元及回傳引數接收單元電性連接,用以對回傳的引數進行分次評價以判斷回傳的引數是否高於一設定閾值。In one embodiment, the first subsystem further has a loopback parameter evaluation unit, which is electrically connected to the base generation unit and the loopback parameter receiving unit, and is used to evaluate the loopback parameters in order to judge the loopback Whether the argument of is higher than a set threshold.
一實施例中,第一子系統更具有一基底輸出單元,與基底生成單元電性連接,用以將基底生成單元生成的第一組基底及第二組基底通過網路傳送至第二子系統。In one embodiment, the first subsystem further has a substrate output unit, which is electrically connected to the substrate generating unit, and is used to transmit the first group of substrates and the second group of substrates generated by the substrate generating unit to the second subsystem through the network .
一實施例中,第二子系統更具有一基底輸入單元,與資料訊號量測單元電性連接,用以接收基底輸出單元傳送來的第一組基底及第二組基底。In one embodiment, the second subsystem further has a substrate input unit, which is electrically connected to the data signal measurement unit and used to receive the first group of substrates and the second group of substrates transmitted from the substrate output unit.
一實施例中,待測資料訊號包含生命徵象(vital sign)。In one embodiment, the data signal to be tested includes vital signs.
另一實施樣態中,本發明提供一種自學式資料分類方法,用於對一待測資料訊號進行分類,包括下列步驟: 依據一群的訓練資料生成代表訓練資料的資料訊號的主成分的一第一組基底,訓練資料的資料訊號與待測資料訊號的屬性相同,第一組基底的基底個數至少為一;依據一第一組待測資料訊號及第一組基底生成對應於第一組基底的一第一組引數;依據第一組引數及訓練資料的資料訊號生成一第二組基底,第二組基底的基底個數至少為一;及依據第一組引數計算出第一組待測資料訊號的分類結果。In another embodiment, the present invention provides a self-learning data classification method for classifying a data signal to be tested, including the following steps: generating a first representation of the main component of the data signal representing training data based on a group of training data A group of bases, the data signal of the training data has the same attributes as the data signal to be tested, and the number of bases of the first group of bases is at least one; based on a first group of data signals to be tested and the first group of bases, the corresponding to the first group is generated A first set of parameters of the base; a second set of bases is generated based on the first set of parameters and the data signal of the training data, and the number of bases of the second set of bases is at least one; and the first set of parameters is calculated according to the first set of parameters The classification result of a set of data signals to be tested.
一實施例中,第一組基底及第二組基底的生成步驟是在一第一子系統上執行,而第一組引數的生成及第一組待測資料訊號的分類結果的計算是在一第二子系統上執行,第二子系統遠離第一子系統且受第一子系統控制。In an embodiment, the steps of generating the first set of bases and the second set of bases are performed on a first subsystem, and the generation of the first set of parameters and the calculation of the classification results of the first set of data signals to be tested are performed in It is executed on a second subsystem, which is far away from and controlled by the first subsystem.
一實施例中,所提供的自學式資料分類方法更包含下列步驟:判斷第一組引數是否高於一設定閾值。In one embodiment, the provided self-learning data classification method further includes the following steps: determining whether the first set of parameters is higher than a set threshold.
一實施例中,所提供的自學式資料分類方法更包含下列步驟: 依據第二組基底及一第二組待測資料訊號生成對應於第二組基底的一第二組引數;及依據第二組引數計算出第二組待測資料訊號的分類結果。In an embodiment, the provided self-learning data classification method further includes the following steps: generating a second set of parameters corresponding to the second set of bases based on the second set of bases and a second set of data signals to be tested; and based on the first Two sets of parameters calculate the classification results of the second set of data signals to be tested.
在本發明所提出的自學式資料分類系統及方法中,由於第二子系統對於量測得的資料訊號進行訊號前處理時所需的基底是由第一子系統生成的,因而完全消除了人為方式選取所可能產生的偏差,不僅提升了訊號前處理的效率也提升了訊號分類的準確度,並確保了分類結果具有高度且有效的分類辨識率。此外,第一子系統採用了多層感知架構來生成基底,因而基底生成的過程完全是自學式的,而透過在生成基底的推論過程中使用矩陣低秩近似法可以有效地將高維度的訊號進行了降維,並有效地減少了多層感知架構所需的輸入層(input layer)神經元(neuron)數量及隱藏層(hidden layer)的數量,藉此降低了分類系統的建置難度及成本。再者,透過將第二子系統用於分類演算法的引數同步回傳給第一子系統,更強化了第一子系統的基底生成單元在生成基底的推論(inference)過程,讓第一子系統的多層感知單元可以生成最佳基底,進而提升第二子系統對於量測得的資料訊號進行訊號前處理的效能。此外,由於第二子系統的分類演算可獨立於訊號前處理之外來進行,因而讓分類演算法的運算機制的調整更為靈活有彈性。In the self-learning data classification system and method proposed by the present invention, since the second subsystem performs signal pre-processing on the measured data signals, the base required by the first subsystem is generated by the first subsystem, thus completely eliminating the artificial The possible deviation of the method selection not only improves the efficiency of signal pre-processing but also improves the accuracy of signal classification, and ensures that the classification results have a high and effective classification recognition rate. In addition, the first subsystem uses a multi-layer perception architecture to generate the base, so the base generation process is completely self-learning, and by using the matrix low-rank approximation method in the inference process of generating the base, high-dimensional signals can be effectively carried out It reduces dimensionality and effectively reduces the number of input layer neurons and hidden layers required by the multi-layer perception architecture, thereby reducing the difficulty and cost of building a classification system. Furthermore, by synchronously transmitting the parameters of the second subsystem used in the classification algorithm to the first subsystem, the inference process of the base generation unit of the first subsystem in generating the base is further strengthened, allowing the first The multi-layer sensing unit of the subsystem can generate the optimal base, thereby enhancing the performance of the second subsystem in signal pre-processing of the measured data signal. In addition, since the classification algorithm of the second subsystem can be performed independently of the signal pre-processing, the adjustment of the calculation mechanism of the classification algorithm is more flexible and flexible.
為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more obvious and understandable, the embodiments are specifically described below in conjunction with the accompanying drawings for detailed description as follows.
本發明揭示一種自學式資料分類系統及方法,所涉及的資料訊號前處理及類神經網路架構中的多層感知器的基本原理,已為本領域普通技術人員所能明瞭,故以下文中的說明,不再作完整描述。同時,以下文中所對照的附圖,意在表達與本發明特徵有關的含義,並未亦不需要依據實際尺寸完整繪製,在先聲明。The present invention discloses a self-learning data classification system and method. The basic principles of the data signal pre-processing and the multi-layer perceptron in the neural network-like architecture are already understood by those of ordinary skill in the art, so the following description , No longer complete description. At the same time, the drawings referred to in the following are intended to express the meanings related to the features of the present invention, and they are not and need not be completely drawn according to the actual size, which is stated in advance.
圖1是依據本發明一實施例所提供的自學式資料分類系統10的系統架構示意圖。本實施例中,自學式資料分類系統10具有一第一子系統100及一第二子系統200,第二子系統200用於量測待測資料訊號或稱待分類資料訊號並利用分類演算法(classification algorithm)得出待測資料訊號的分類結果,而第一子系統100則用於生成第二子系統200對於量測得的待測資料訊號進行訊號前處理時所需的基底,讓第二子系統200藉此得到該基底的對應引數,並將引數用於後續的分類演算。第一子系統100通常遠離第二子系統200設置,但第一子系統100及第二子系統200間可透過網路彼此連結而能夠進行第一子系統100所生成基底及第二子系統200所生成引數的傳送及接收。第一系統100例如是一伺服器端(server)系統,而第二系統200例如是一客戶端(client)系統。其他實施例中,第一子系統100與第二子系統200可以電性連接在一起而不是通過網路溝通,讓基底及引數的生成在同一個硬體架構下實現。FIG. 1 is a schematic diagram of a system architecture of a self-learning
如圖1所示,一實施例中,第一子系統100具有彼此電性連接的一資料庫101、一基底生成單元102、一基底輸出單元103、一回傳引數接收單元104及一回傳引數評價單元105。資料庫101是一個儲存有一群的訓練資料的資料庫,儲存例如以記憶體來實現,所儲存的訓練資料的資料訊號與第二子系統200所量測的待測資料訊號具有相同屬性,例如都是載有生命徵象的生理訊號。在第一子系統100的運作過程中,基底生成單元102用以依據資料庫的訓練資料生成一組代表這些訓練資料的資料訊號的主成分的基底,並透過基底輸出單元103以線上更新的方式將生成的基底傳送至第二子系統200。每一組基底的基底個數至少為一。As shown in FIG. 1, in one embodiment, the
圖2是依據本發明一實施例所提供的自學式資料分類系統10的第一子系統100的基底生成單元102的功能結構示意圖。圖3是基底生成單元102依據訓練資料中的已知資料訊號q生成一組基底的推論過程示意圖。如圖2所示,一實施例中,基底生成單元102具有彼此電性連接的一矩陣化處理單元1021、一特徵分解單元1022、一矩陣低秩近似處理單元1023及一多層感知單元1024。矩陣化處理單元1021用以將訓練資料中的至少部份資料訊號例如是高維度資料訊號q轉化成矩陣A。特徵分解單元1021用以對矩陣A實施一奇異值分解(SVD)後獲得奇異值Σ與奇異向量V
T,其中A≒UΣV
T,U及V均為正交歸一矩陣(orthonormal matrix),即矩陣中包含的各向量在高維空間中互相垂直且長度為1。如此一來,矩陣A可被視為資料訊號q在V座標系上的投影,且各座標值依據Σ的對角線上的數值做倍數縮放後,於U的座標系上重組。如圖2及3所示,矩陣低秩近似處理單元1023用以選取依大小順序排列後的前幾個奇異值Σ'以及對應的奇異向量V'
T,並據以計算出矩陣A的最近似低秩矩陣A',進而計算出最近似低秩矩陣A'對應的降維資料訊號q'。如圖2及3所示,多層感知單元1024接收每個經過降維的資料訊號q',依據類神經網路架構中的多層感知器(multilayer perceptrons)原理,輸出一組代表這些資料訊號q的主成分的基底。由於投影後的座標系已代表已知資料訊號q中各自線性獨立且依照特徵值大小排序的主成分,亦即原資料訊號中較為相關的資訊均已群聚於各座標軸,因此可以有效降低多層感知單元1024所需的輸入層10241神經元數量及隱藏層10242的數量。
2 is a schematic diagram of the functional structure of the
請再度參閱圖1,回傳引數接收單元104用以接收第二子系統200生成並回傳的引數,回傳引數評價單元105用以對回傳的引數進行分次評價以判斷回傳的引數是否高於一設定閾值。所謂的分次評價是指每經過一設定時間段,例如10秒,才對回傳的引數進行評價而不是時時刻刻對回傳引數進行評價。當回傳的引數經分次評價後的判斷結果為低於設定閾值時繼續分次評價,而當回傳的引數經分次評價後的判斷結果為高於設定閾值時則基底生成單元102依據回傳的引數及資料庫中的訓練資料的資料訊號生成不同於原基底的新的一組代表這些資料訊號的主成分的基底,基底的個數至少為一。在第一子系統100的運作過程中,當第二子系統200所生成的引數回傳時,第一子系統100可視情況以線上更新的方式修正第二子系統200運作過程中所使用的基底,讓第二子系統200能以新的一組基底進行量測得的待測訊號的分類。Please refer to FIG. 1 again, the return
另一方面,如圖1所示,一實施例中,第二子系統200具有彼此電性連接的資料訊號量測單元201、基底輸入單元202、引數生成單元203、引數回傳單元204及引數分類計算單元205。資料訊號量測單元201用以量測並接收經由第二子系統200的輸入端211輸入第二子系統200的待測資料訊號210並據以儲存,例如以暫存記憶體來實現,待測資料訊號210與第一子系統100的資料庫101中所儲存的訓練資料的資料訊號具有相同屬性,例如是生理訊號。基底輸入單元202用以接收第一子系統100的基底輸出單元103傳送來的基底。引數生成單元203依據基底輸入單元202所接收到的基底對資料訊號量測單元201所接收到的待測資料訊號210進行引數計算,進而生成對應於這些基底的引數。引數回傳單元204將生成的引數回傳給第一子系統100,並由回傳引數接收單元104接收。引數分類計算單元205用以透過一分類演算法依據生成的引數計算出待測資料訊號210的分類結果220,分類演算法通常以電腦程式的執行來實現。分類結果220最後經由第二子系統200的輸出端212輸出。在第二子系統200從量測得待測資料訊號210至進行分類演算法的過程中,生成的引數可回傳給第一子系統100,讓第一子系統100可以依據回傳的引數強化其生成最佳基底的能力。On the other hand, as shown in FIG. 1, in an embodiment, the
圖4是依據本發明一實施例所提供的自學式資料分類系統10的第一子系統100於整個自學式資料分類系統10進行資料分類時的實施步驟流程圖。本實施例中,自學式資料分類系統10的資料分類方法具有下列步驟:FIG. 4 is a flow chart of implementation steps of the
步驟601:接收回傳引數。如圖1所示,自學式資料分類系統10的第一子系統100的回傳引數接收單元104接收來自第二子系統200的引數回傳單元204所回傳的引數。Step 601: Receive the return parameter. As shown in FIG. 1, the return
步驟602:判斷是否有回傳引數。如圖1所示,自學式資料分類系統10的第一子系統100的回傳引數評價單元105判斷是否有來自第二子系統200的回傳引數。否的話,執行步驟603;是的話,執行步驟604。Step 602: Determine whether there are return parameters. As shown in FIG. 1, the loopback
步驟603:依據資料庫中的訓練資料的資料訊號生成一第一組代表訓練資料的資料訊號的主成份的基底。如圖1至3所示,第一子系統100的基底生成單元102依據資料庫101中的訓練資料生成一第一組代表訓練資料的資料訊號的主成份的基底。接著,執行步驟607。Step 603: According to the data signal of the training data in the database, generate a first set of bases representing the main components of the data signal of the training data. As shown in FIGS. 1 to 3, the
步驟604:分次評價回傳引數。如圖1所示,第一子系統100的回傳引數評價單元105每經過一設定時間段,例如10秒,對回傳的引數進行評價,接著執行步驟605。Step 604: Evaluate the return parameters in stages. As shown in FIG. 1, the feedback
步驟605:判斷回傳的引數是否高於設定閾值。如圖1所示,第一子系統100的回傳引數評價單元105判斷回傳的引數是否高於一設定閾值。否的話,回到步驟604,是的話,則執行步驟606。Step 605: Determine whether the returned parameter is higher than the set threshold. As shown in FIG. 1, the loopback
步驟606: 依據回傳的引數及資料庫中的訓練資料的資料訊號生成一第二組代表訓練資料的資料訊號的主成份的基底。第二組基底不同於第一組基底,第二組基底的基底個數至少為一。如圖1至3所示,第一子系統100的基底生成單元102依據回傳的引數及資料庫101中的訓練資料的資料訊號生成新的一組代表訓練資料的資料訊號的主成份的基底。接著,執行步驟607。Step 606: Generate a second set of bases representing the main components of the data signal of the training data based on the returned parameters and the data signal of the training data in the database. The second group of substrates is different from the first group of substrates, and the number of substrates of the second group of substrates is at least one. As shown in FIGS. 1 to 3, the
步驟607: 輸出基底。如圖1所示,在基底生成單元102生成基底後,由基底輸出單元103輸出基底至第二子系統200。Step 607: Output the substrate. As shown in FIG. 1, after the
圖5是依據本發明一實施例所提供的自學式資料分類系統10的第二子系統200於整個自學式資料分類系統10進行資料分類時的實施步驟流程圖。本實施例中,自學式資料分類系統10的資料分類方法具有下列步驟:FIG. 5 is a flow chart of implementation steps of the
步驟701:量測得待測資料訊號。如圖1所示,自學式資料分類系統10的第二子系統200的資料訊號量測單元201量測並接收經由第二子系統200的輸入端211輸入第二子系統200的待測資料訊號210。接著,執行步驟702。Step 701: Measure the data signal to be measured. As shown in FIG. 1, the data signal
步驟702:以輸入基底生成引數。如圖1所示,自學式資料分類系統10的第二子系統200的基底輸入單元202接收了第一子系統100傳送過來的第一組基底及第二組基底,並由引數生成單元203依據基底輸入單元202所接收到的第一組基底及第二組基底其中之一對所接收到的待測資料訊號210進行引數計算,進而生成對應於第一組基底或第二組基底的引數。接著,執行步驟703及步驟705。Step 702: Generate parameters with the input base. As shown in FIG. 1, the
步驟703:依據生成的引數計算出待測資料訊號的分類結果。如圖1所示,引數分類計算單元205依據生成的引數透過一分類演算法計算出待測資料訊號210的分類結果220。接著,執行步驟704。Step 703: Calculate the classification result of the data signal to be tested according to the generated parameters. As shown in FIG. 1, the parameter
步驟704:輸出分類結果。如圖1所示,分類結果220最後經由第二子系統200的輸出端212輸出。Step 704: Output the classification result. As shown in FIG. 1, the
步驟705:回傳引數。如圖1所示,第二子系統200的引數回傳單元204將生成的引數回傳給第一子系統100的基底生成單元102參考。Step 705: Return the argument. As shown in FIG. 1, the
圖6是依據本發明另一實施例所提供的自學式資料分類方法的實施步驟流程圖。本實施例中,自學式資料分類方法具有下列步驟:6 is a flowchart of implementation steps of a self-learning data classification method according to another embodiment of the present invention. In this embodiment, the self-learning data classification method has the following steps:
步驟801:依據一群的訓練資料生成代表該些訓練資料的資料訊號的主成分的一第一組基底,訓練資料的資料訊號與待測資料訊號的屬性相同,第一組基底的基底個數至少為一。接著,執行步驟802。Step 801: Generate a first set of bases representing the principal components of the data signals of the training data according to a group of training data. The data signal of the training data has the same attributes as the data signal to be tested, and the number of bases of the first set of bases is at least For one. Then, step 802 is performed.
步驟802:依據一第一組待測資料訊號及第一組基底生成對應於第一組基底的第一組引數。接著,執行步驟803及步驟806。Step 802: Generate a first set of parameters corresponding to the first set of bases based on a first set of data signals to be tested and the first set of bases. Next,
步驟803:判斷第一組引數是否高於一設定閾值。接著,執行步驟804。Step 803: Determine whether the first set of parameters is higher than a set threshold. Then, step 804 is performed.
步驟804:當第一組引數高於一設定閾值時,依據第一組引數及該些訓練資料的資料訊號生成一第二組基底,第二組基底的基底個數至少為一。接著,執行步驟805。Step 804: When the first set of parameters is higher than a set threshold, generate a second set of bases based on the first set of parameters and the data signals of the training data, and the number of bases in the second set of bases is at least one. Then, step 805 is executed.
步驟805:依據第二組基底及一第二組待測資料訊號生成對應於第二組基底的第二組引數。接著,執行步驟806。Step 805: Generate a second set of parameters corresponding to the second set of bases based on the second set of bases and a second set of data signals to be tested. Then, step 806 is executed.
步驟806: 依據第一組引數計算出第一組待測資料訊號的分類結果或依據第二組引數計算出第二組待測資料訊號的分類結果。Step 806: Calculate the classification result of the first set of data signals to be tested according to the first set of parameters or calculate the classification result of the second set of data signals to be tested according to the second set of parameters.
一實施例中,上述步驟801、803、804可以是在如圖1所示的一第一子系統100上執行,而上述步驟802、805及806可以是在如圖1所示的一第二子系統200上執行,第二子系統200遠離第一子系統100且受第一子系統100控制。In an embodiment, the
在本發明所提出的自學式資料分類系統及方法中,由於第二子系統200對於量測得的資料訊號進行訊號前處理時所需的基底是由第一子系統100生成的,因而完全消除了人為方式選取所可能產生的偏差,不僅提升了訊號前處理的效率也提升了訊號分類的準確度,並確保了分類結果具有高度且有效的分類辨識率。此外,第一子系統100採用了多層感知架構來生成基底,因而基底生成的過程完全是自學式的,而透過在生成基底的推論過程中使用矩陣低秩近似法有效地將高維度的訊號進行了降維,並有效地減少了多層感知架構所需的輸入層神經元數量及隱藏層的數量,藉此降低了分類系統的建置難度及成本。再者,透過將第二子系統200用於分類演算法的引數同步回傳給第一子系統100,更強化了第一子系統100的基底生成單元102在生成基底的推論過程,讓第一子系統100的多層感知單元1024可以生成最佳基底,進而提升第二子系統200對於量測得的資料訊號進行訊號前處理的效能。此外,由於第二子系統的分類演算可獨立於訊號前處理之外來進行,因而讓分類演算法的運算機制的調整更為靈活有彈性。In the self-learning data classification system and method proposed by the present invention, since the base required by the
在應用上,當所提出的自學式資料分類系統10的第二子系統200所量測的待測資料訊號是包含生命徵象(vital sign)的生理訊號時,資料訊號量測單元201可以是一個生命徵象量測單元,用以量測人體的體溫、脈博、呼吸及血壓。此時,第二子系統200可以是一個手持的生理訊號檢測器,而第一子系統100可以是一個遠端控制手持生理訊號量測器的雲端伺服器,手持的生理訊號檢測器與雲端伺服器間以有線或無線的網路連接。由於各種生理訊號的資料有其範圍,因而經降維後所獲得的基底和引數之間的關係可被確定下來,更適於應用本案所提出的自學式資料分類系統。換言之,任何經降維後所獲得的基底和引數間的關係具有可確定性的資料訊號,都適於應用本發明所提出的自學式資料分類系統及方法進行分類。In terms of application, when the data signal to be measured measured by the
本發明的各種實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,例如將第一子系統及第二子系統整合在一起。故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Various embodiments of the present invention are disclosed as above, but they are not intended to limit the present invention. Any person with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. For example, The first subsystem and the second subsystem are integrated together. Therefore, the protection scope of the present invention shall be deemed as defined by the appended patent application scope.
10 自學式資料分類系統 100 第一子系統 101 資料庫 102 基底生成單元 1021 矩陣化處理單元 1022 特徵分解單元 1023 矩陣低秩近似處理單元 1024 多層感知單元 10241 輸入層 10242 隱藏層 103 基底輸出單元 104 回傳引數接收單元 105 回傳引數評價單元 200 第二子系統 201 資料訊號量測單元 202 基底輸入單元 203 引數生成單元 204 引數回傳單元 205 引數分類計算單元 210 待測資料訊號 211 輸入端 212 輸出端 220 分類結果 601~607 步驟 701~705 步驟 801~806 步驟10 self-
圖1是依據本發明一實施例的一種自學式資料分類系統的系統架構示意圖。 圖2是依據本發明一實施例的一種自學式資料分類系統的第一子系統的基底生成單元的功能結構示意圖。 圖3是依據本發明一實施例的一種自學式資料分類系統的第一子系統的基底生成單元依據訓練資料中的已知資料訊號生成基底的推論過程示意圖。 圖4是依據本發明一實施例的一種自學式資料分類系統的第一子系統於整個自學式資料分類系統進行資料分類時的實施步驟流程圖。 圖5是依據本發明一實施例的一種自學式資料分類系統的第二子系統於整個自學式資料分類系統進行資料分類時的實施步驟流程圖。 圖6是依據本發明另一實施例的一種自學式資料分類方法的實施步驟流程圖。 圖7是已知將高維度訊號降維成特定基底與其對應引數的組合的示意圖。FIG. 1 is a schematic diagram of a system architecture of a self-learning data classification system according to an embodiment of the invention. 2 is a schematic diagram of a functional structure of a base generation unit of a first subsystem of a self-learning data classification system according to an embodiment of the invention. FIG. 3 is a schematic diagram of an inference process of a base generation unit of a first subsystem of a self-learning data classification system according to an embodiment of the present invention to generate a base based on known data signals in training data. FIG. 4 is a flowchart of implementation steps of a first subsystem of a self-learning data classification system according to an embodiment of the present invention when data classification is performed by the entire self-learning data classification system. FIG. 5 is a flowchart of implementation steps of a second subsystem of a self-learning data classification system according to an embodiment of the present invention when data classification is performed by the entire self-learning data classification system. 6 is a flowchart of implementation steps of a self-learning data classification method according to another embodiment of the present invention. FIG. 7 is a schematic diagram of known high-dimensional signal dimensionality reduction into a combination of a specific base and its corresponding parameter.
10 自學式資料分類系統 100 第一子系統 101 資料庫 102 基底生成單元 103 基底輸出單元 104 回傳引數接收單元 105 回傳引數評價單元 200 第二子系統 201 資料訊號量測單元 202 基底輸入單元 203 引數生成單元 204 引數回傳單元 205 引數分類計算單元 210 待測資料訊號 211 輸入端 212 輸出端 220 分類結果10 self-learning classification system information database 100 a
Claims (10)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW107116402A TWI682330B (en) | 2018-05-15 | 2018-05-15 | Self-learning data classification system and method |
| CN201910263690.2A CN110490216A (en) | 2018-05-15 | 2019-04-03 | A kind of self-study formula data sorting system and method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW107116402A TWI682330B (en) | 2018-05-15 | 2018-05-15 | Self-learning data classification system and method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW201947465A TW201947465A (en) | 2019-12-16 |
| TWI682330B true TWI682330B (en) | 2020-01-11 |
Family
ID=68545811
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW107116402A TWI682330B (en) | 2018-05-15 | 2018-05-15 | Self-learning data classification system and method |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN110490216A (en) |
| TW (1) | TWI682330B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI808785B (en) * | 2022-06-10 | 2023-07-11 | 英業達股份有限公司 | Data splitting system and method for validating machine learning |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6993193B2 (en) * | 2002-03-26 | 2006-01-31 | Agilent Technologies, Inc. | Method and system of object classification employing dimension reduction |
| US20140324739A1 (en) * | 2010-06-09 | 2014-10-30 | Heiko Claussen | Systems and methods for learning of normal sensor signatures, condition monitoring and diagnosis |
| US20140343396A1 (en) * | 2003-07-01 | 2014-11-20 | Cardiomag Imaging, Inc. | Use of Machine Learning for Classification of Magneto Cardiograms |
| CN104408476A (en) * | 2014-12-08 | 2015-03-11 | 西安电子科技大学 | Deep sparse main component analysis-based polarimetric SAR image classification method |
| CN105528516A (en) * | 2015-12-01 | 2016-04-27 | 三门县人民医院 | Clinic pathology data classification method based on combination of principal component analysis and extreme learning machine |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102317786A (en) * | 2007-04-18 | 2012-01-11 | 特提斯生物科学公司 | Diabetes correlativity biological marker and method of application thereof |
| KR101236040B1 (en) * | 2011-04-06 | 2013-02-21 | 조선대학교산학협력단 | Fingerprint verification apparatus and method therefor using PCA |
| CN103646252B (en) * | 2013-12-05 | 2017-01-11 | 江苏大学 | Optimized fuzzy learning vector quantization apple classification method |
| EP3332357A1 (en) * | 2015-08-04 | 2018-06-13 | Siemens Aktiengesellschaft | Visual representation learning for brain tumor classification |
| CN107506787B (en) * | 2017-07-27 | 2019-09-10 | 陕西师范大学 | A kind of glue into concrete beam cracks classification method based on migration self study |
-
2018
- 2018-05-15 TW TW107116402A patent/TWI682330B/en active
-
2019
- 2019-04-03 CN CN201910263690.2A patent/CN110490216A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6993193B2 (en) * | 2002-03-26 | 2006-01-31 | Agilent Technologies, Inc. | Method and system of object classification employing dimension reduction |
| US20140343396A1 (en) * | 2003-07-01 | 2014-11-20 | Cardiomag Imaging, Inc. | Use of Machine Learning for Classification of Magneto Cardiograms |
| US20140324739A1 (en) * | 2010-06-09 | 2014-10-30 | Heiko Claussen | Systems and methods for learning of normal sensor signatures, condition monitoring and diagnosis |
| CN104408476A (en) * | 2014-12-08 | 2015-03-11 | 西安电子科技大学 | Deep sparse main component analysis-based polarimetric SAR image classification method |
| CN105528516A (en) * | 2015-12-01 | 2016-04-27 | 三门县人民医院 | Clinic pathology data classification method based on combination of principal component analysis and extreme learning machine |
Also Published As
| Publication number | Publication date |
|---|---|
| TW201947465A (en) | 2019-12-16 |
| CN110490216A (en) | 2019-11-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Reddi et al. | Mlperf inference benchmark | |
| Bargshady et al. | Enhanced deep learning algorithm development to detect pain intensity from facial expression images | |
| US11521716B2 (en) | Computer-implemented detection and statistical analysis of errors by healthcare providers | |
| US20200334809A1 (en) | Computer-implemented machine learning for detection and statistical analysis of errors by healthcare providers | |
| US20220164346A1 (en) | Query-oriented approximate query processing based on machine learning techniques | |
| CN114897102B (en) | Industrial robot fault diagnosis method, system, device and storage medium | |
| US9852378B2 (en) | Information processing apparatus and information processing method to estimate cause-effect relationship between variables | |
| Levashenko et al. | Reliability estimation of healthcare systems using fuzzy decision trees | |
| CN116894211A (en) | System for generating human perceptible interpretive output, method and computer program for monitoring anomaly identification | |
| HK1221541A1 (en) | Method and device for detecting user quality | |
| Lerch et al. | Efficient quantum-enhanced classical simulation for patches of quantum landscapes | |
| CN110291539A (en) | Processing method, system, program and storage medium for generating learning data, and method and system for generating learning data | |
| CN110766060A (en) | Time series similarity calculation method, system and medium based on deep learning | |
| Liu et al. | An integrated framework for eye tracking-assisted task capability recognition of air traffic controllers with machine learning | |
| Zhang et al. | One step closer to unbiased aleatoric uncertainty estimation | |
| TWI682330B (en) | Self-learning data classification system and method | |
| US11195056B2 (en) | System improvement for deep neural networks | |
| Nalci et al. | Human action recognition with raw millimeter wave radar data | |
| AlRababah | Neural networks precision in technical vision systems | |
| WO2021038840A1 (en) | Object number estimation device, control method, and program | |
| CN116880688A (en) | Gesture recognition method and system based on multi-channel information fusion | |
| CN115423186A (en) | Cost prediction method, device, medium and equipment based on neural network model | |
| Kim et al. | DANDI: Diffusion as Normative Distribution for Deep Neural Network Input | |
| Wang et al. | Reference-based GAN Evaluation by Adaptive Inversion | |
| JP7498688B2 (en) | Model, device and method for estimating acceptability using relationship between change in target state and target state |