[go: up one dir, main page]

TWI682330B - Self-learning data classification system and method - Google Patents

Self-learning data classification system and method Download PDF

Info

Publication number
TWI682330B
TWI682330B TW107116402A TW107116402A TWI682330B TW I682330 B TWI682330 B TW I682330B TW 107116402 A TW107116402 A TW 107116402A TW 107116402 A TW107116402 A TW 107116402A TW I682330 B TWI682330 B TW I682330B
Authority
TW
Taiwan
Prior art keywords
data
unit
bases
group
subsystem
Prior art date
Application number
TW107116402A
Other languages
Chinese (zh)
Other versions
TW201947465A (en
Inventor
黃彥銘
Original Assignee
美爾敦股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 美爾敦股份有限公司 filed Critical 美爾敦股份有限公司
Priority to TW107116402A priority Critical patent/TWI682330B/en
Priority to CN201910263690.2A priority patent/CN110490216A/en
Publication of TW201947465A publication Critical patent/TW201947465A/en
Application granted granted Critical
Publication of TWI682330B publication Critical patent/TWI682330B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A self-learning data classification system includes a data base storing a group of training data with same features of a data signal to be measured; a base generating unit electrically connecting the data base to generate a first group of bases representing principal components of data signals of the training data; an index generating unit used to generate corresponding indices according to the first group of bases and the data signal to be measured; and a index classification computing unit electrically connecting the index generating unit to calculate out classification result of the data signal to be measured according to the generated indices. The base generating unit generates a second group of bases being different from the first group of bases according to the generated indices.

Description

自學式資料分類系統及方法Self-learning data classification system and method

本發明是關於一種資料分類系統及方法,特別是一種在資料分類過程中能夠自我提升分類能力的資料分類系統及方法。The invention relates to a data classification system and method, in particular to a data classification system and method capable of self-improving classification ability in the process of data classification.

現今電腦的普及和網路的發達,造就了各種不同類型資料的大量累積及相應資料庫的建立。在大數據的時代下,這些資料對於企業來說是寶貴的資產,也是產品開發及應用過程中不可或缺的利器。眾所周知,這些資料的建立仰賴於承載這些資料的訊號的各種處理,而訊號分類是其中最為關鍵的。藉由準確的訊號分類,可建立已知特定事件對應已知特定結果的資料,並進而利用這些資料來分析或預測待測事件的發生結果,這類的資料分類系統產品也已陸續被開發出來。舉例而言,我們可以對大量已從人體檢測得到的生理訊號進行分類,並依據分類結果來診斷人體的健康情況。Nowadays, the popularity of computers and the development of the Internet have created a large accumulation of various types of data and the establishment of corresponding databases. In the era of big data, these materials are valuable assets for enterprises and indispensable tools in the process of product development and application. As we all know, the establishment of these data depends on the various processing of the signals that carry these data, and signal classification is the most critical. By accurate signal classification, data of known specific events corresponding to known specific results can be created, and then the data can be used to analyze or predict the results of the events to be tested. Such data classification system products have also been developed . For example, we can classify a large number of physiological signals that have been obtained from human detection, and use the classification results to diagnose human health.

進行訊號分類時,所處理的資料訊號往往是高維度的訊號而有著高資料複雜度,如果直接以這些高維度的訊號進行分類,勢必增加分類系統建置上的困難度。為了降低分類系統的建置難度及成本,需要先對這些高維度的訊號進行訊號前處理而降維(dimension reduction)成低維度的訊號,之後再對這些低維度的訊號進行分類。已知的訊號前處理方式例如是二維卷積神經網路分析法、一維卷積神經網路分析法或遞迴神經網路分析法。然而,採用這些分析法時所需的資料量及花費時間相當大,對於講求效率與成果的資料分類系統而言並不划算。其他的訊號前處理方式又例如將資料訊號降維成特定基底(basis)與其對應引數(index)的組合的主成分分析法(Principal Component Analysis; PCA)。所謂的特定基底是一群與欲降維的資訊訊號有著相同資料屬性的資料訊號的共同部分,而對應引數是這些基底在欲降維的資料訊號中的成分比例。如圖7所示,藉由A、B及C三個共同基底的轉化,可將五個資料訊號分別表示成這些共同基底與三個對應引數的組合。然而,以這種方式建置的資料分類系統準確度取決於降維後的資料訊號準確度,而降維後的資料訊號準確度與基底的決定息息相關,若基底的決定僅僅是透過人為方式達成,則容易產生偏差而不易得到最佳的基底,且人為方式的決定無法標準化作業以產生量化指標,無法讓分類系統的準確度得到控制。When classifying signals, the processed data signals are often high-dimensional signals and have high data complexity. If these high-dimensional signals are directly used for classification, it is bound to increase the difficulty of the classification system. In order to reduce the difficulty and cost of building a classification system, it is necessary to perform signal pre-processing on these high-dimensional signals and dimension reduction into low-dimensional signals, and then classify these low-dimensional signals. Known signal pre-processing methods are, for example, two-dimensional convolutional neural network analysis, one-dimensional convolutional neural network analysis, or recurrent neural network analysis. However, the amount of data and time required to use these analysis methods is quite large, and it is not cost-effective for a data classification system that emphasizes efficiency and results. Other signal pre-processing methods are, for example, Principal Component Analysis (PCA), which reduces the dimensionality of the data signal to a combination of a specific basis and its corresponding index. The so-called specific basis is a common part of a group of data signals that have the same data attributes as the information signal to be reduced, and the corresponding parameter is the proportion of these bases in the data signal to be reduced. As shown in FIG. 7, by transforming the three common bases of A, B, and C, the five data signals can be expressed as a combination of these common bases and three corresponding parameters, respectively. However, the accuracy of the data classification system built in this way depends on the accuracy of the data signal after dimensionality reduction, and the accuracy of the data signal after dimensionality reduction is closely related to the decision of the base, if the decision of the base is only achieved by human means , It is easy to produce deviations and not easy to get the best base, and the decision of the human way cannot be standardized to produce quantitative indicators, and the accuracy of the classification system cannot be controlled.

因此,如何在降低分類系統的建置難度及成本的同時,降低資料訊號複雜度且提升訊號分類的準確度,確保基於此訊號分類所建置的資料分類系統產品在應用上能夠更準確地分析或預測待測事件的發生結果,是本發明欲解決的技術課題。Therefore, how to reduce the difficulty and cost of the classification system while reducing the complexity of the data signal and improving the accuracy of the signal classification to ensure that the data classification system product built based on this signal classification can be more accurately analyzed in application Or predicting the result of the event to be measured is a technical problem to be solved by the present invention.

有鑒於上述問題,本發明提供一種自學式資料分類系統及方法,其透過電腦自動學習的方式生成基底,消除了人為方式選取所可能產生的偏差,提升了訊號分類的準確度,確保基於此訊號分類方式建置的資料分類系統具有高度分類辨識率及有效的分類效果(classification result)。In view of the above problems, the present invention provides a self-learning data classification system and method, which generates a base through a computer automatic learning method, eliminates possible deviations in the selection of artificial methods, improves the accuracy of signal classification, and ensures that based on this signal The data classification system built by the classification method has a high classification recognition rate and an effective classification result (classification result).

一實施態樣中,本發明提供一種自學式資料分類系統,用於對一待測資料訊號(也可稱待分類資料訊號)進行分類。所提供的自學式資料分類系統包括一第一子系統及一第二子系統。第一子系統具有一資料庫,儲存有一群的訓練資料(training data),訓練資料的資料訊號與待測資料訊號具有相同屬性;一基底生成單元,與資料庫電性連接,用以依據訓練資料生成一第一組代表訓練資料的資料訊號的主成分(principal component)的基底,第一組基底的基底個數至少為一;及一回傳引數接收單元,與基底生成單元電性連接。第二子系統與第一子系統電性連接或網路連接,具有一資料訊號量測單元,用以量測待測資料訊號;一引數生成單元,與資料訊號量測單元電性連接,用以依據第一組基底及待測資料訊號生成對應於第一組基底的引數(index);一引數回傳單元,與引數生成單元電性連接,用以將生成的引數回傳至第一子系統;及一引數分類計算單元,與引數生成單元電性連接,用以依據生成的引數計算出待測資料訊號的分類結果。其中,回傳引數接收單元接收了引數回傳單元回傳的引數,且基底生成單元依據回傳的引數及訓練資料的資料訊號生成不同於第一組基底的一第二組基底,第二組基底的基底個數至少為一。In one embodiment, the present invention provides a self-learning data classification system for classifying a data signal to be tested (also called a data signal to be classified). The self-learning data classification system provided includes a first subsystem and a second subsystem. The first subsystem has a database that stores a group of training data (training data), the data signal of the training data has the same attributes as the signal of the data to be tested; a base generation unit, which is electrically connected to the database and used for training The data generates a first group of bases representing the principal component of the data signal of the training data. The number of bases of the first group of bases is at least one; and a return parameter receiving unit electrically connected to the base generating unit . The second subsystem is electrically connected or network-connected with the first subsystem and has a data signal measurement unit for measuring the data signal to be measured; a parameter generation unit is electrically connected to the data signal measurement unit, It is used to generate the index corresponding to the first group of bases according to the first group of bases and the data signal to be tested; an argument return unit is electrically connected to the parameter generating unit to return the generated parameters It is transmitted to the first subsystem; and a parameter classification calculation unit, which is electrically connected to the parameter generation unit, and used to calculate the classification result of the data signal to be tested according to the generated parameters. The return parameter receiving unit receives the parameters returned by the parameter return unit, and the base generation unit generates a second set of bases different from the first set of bases according to the returned parameters and the data signal of the training data The number of substrates in the second group of substrates is at least one.

一實施例中,基底生成單元具有一矩陣化處理單元,與資料庫電性連接,其將訓練資料中的至少部分資料訊號轉化成一矩陣(matrix);一特徵分解單元,與矩陣化處理單元電性連接,其對矩陣實施一奇異值分解(Singular Value Decomposition;SVD)後獲得一奇異值(singular value)及一對應的奇異向量(singular vector);一矩陣低秩近似(low rank approximation)處理單元,與特徵分解單元電性連接,其依據奇異值及奇異向量計算出矩陣的最近似低秩矩陣;及一多層感知單元,與矩陣低秩近似處理單元電性連接,其接收最近似低秩矩陣對應的資料訊號並輸出第一組基底及第二組基底其中之一。In one embodiment, the base generation unit has a matrix processing unit, electrically connected to the database, which converts at least part of the data signals in the training data into a matrix; a feature decomposition unit, which is electrically connected to the matrix processing unit Sexual connection, which implements a singular value decomposition (SVD) on the matrix to obtain a singular value and a corresponding singular vector; a low rank approximation processing unit of the matrix , Which is electrically connected to the feature decomposition unit, which calculates the most approximate low-rank matrix of the matrix based on the singular values and singular vectors; and a multi-layer sensing unit, which is electrically connected to the matrix low-rank approximate processing unit, which receives the most approximate low rank The data signal corresponding to the matrix outputs one of the first group of substrates and the second group of substrates.

一實施例中,第一子系統更具有一回傳引數評價單元,與基底生成單元及回傳引數接收單元電性連接,用以對回傳的引數進行分次評價以判斷回傳的引數是否高於一設定閾值。In one embodiment, the first subsystem further has a loopback parameter evaluation unit, which is electrically connected to the base generation unit and the loopback parameter receiving unit, and is used to evaluate the loopback parameters in order to judge the loopback Whether the argument of is higher than a set threshold.

一實施例中,第一子系統更具有一基底輸出單元,與基底生成單元電性連接,用以將基底生成單元生成的第一組基底及第二組基底通過網路傳送至第二子系統。In one embodiment, the first subsystem further has a substrate output unit, which is electrically connected to the substrate generating unit, and is used to transmit the first group of substrates and the second group of substrates generated by the substrate generating unit to the second subsystem through the network .

一實施例中,第二子系統更具有一基底輸入單元,與資料訊號量測單元電性連接,用以接收基底輸出單元傳送來的第一組基底及第二組基底。In one embodiment, the second subsystem further has a substrate input unit, which is electrically connected to the data signal measurement unit and used to receive the first group of substrates and the second group of substrates transmitted from the substrate output unit.

一實施例中,待測資料訊號包含生命徵象(vital sign)。In one embodiment, the data signal to be tested includes vital signs.

另一實施樣態中,本發明提供一種自學式資料分類方法,用於對一待測資料訊號進行分類,包括下列步驟: 依據一群的訓練資料生成代表訓練資料的資料訊號的主成分的一第一組基底,訓練資料的資料訊號與待測資料訊號的屬性相同,第一組基底的基底個數至少為一;依據一第一組待測資料訊號及第一組基底生成對應於第一組基底的一第一組引數;依據第一組引數及訓練資料的資料訊號生成一第二組基底,第二組基底的基底個數至少為一;及依據第一組引數計算出第一組待測資料訊號的分類結果。In another embodiment, the present invention provides a self-learning data classification method for classifying a data signal to be tested, including the following steps: generating a first representation of the main component of the data signal representing training data based on a group of training data A group of bases, the data signal of the training data has the same attributes as the data signal to be tested, and the number of bases of the first group of bases is at least one; based on a first group of data signals to be tested and the first group of bases, the corresponding to the first group is generated A first set of parameters of the base; a second set of bases is generated based on the first set of parameters and the data signal of the training data, and the number of bases of the second set of bases is at least one; and the first set of parameters is calculated according to the first set of parameters The classification result of a set of data signals to be tested.

一實施例中,第一組基底及第二組基底的生成步驟是在一第一子系統上執行,而第一組引數的生成及第一組待測資料訊號的分類結果的計算是在一第二子系統上執行,第二子系統遠離第一子系統且受第一子系統控制。In an embodiment, the steps of generating the first set of bases and the second set of bases are performed on a first subsystem, and the generation of the first set of parameters and the calculation of the classification results of the first set of data signals to be tested are performed in It is executed on a second subsystem, which is far away from and controlled by the first subsystem.

一實施例中,所提供的自學式資料分類方法更包含下列步驟:判斷第一組引數是否高於一設定閾值。In one embodiment, the provided self-learning data classification method further includes the following steps: determining whether the first set of parameters is higher than a set threshold.

一實施例中,所提供的自學式資料分類方法更包含下列步驟: 依據第二組基底及一第二組待測資料訊號生成對應於第二組基底的一第二組引數;及依據第二組引數計算出第二組待測資料訊號的分類結果。In an embodiment, the provided self-learning data classification method further includes the following steps: generating a second set of parameters corresponding to the second set of bases based on the second set of bases and a second set of data signals to be tested; and based on the first Two sets of parameters calculate the classification results of the second set of data signals to be tested.

在本發明所提出的自學式資料分類系統及方法中,由於第二子系統對於量測得的資料訊號進行訊號前處理時所需的基底是由第一子系統生成的,因而完全消除了人為方式選取所可能產生的偏差,不僅提升了訊號前處理的效率也提升了訊號分類的準確度,並確保了分類結果具有高度且有效的分類辨識率。此外,第一子系統採用了多層感知架構來生成基底,因而基底生成的過程完全是自學式的,而透過在生成基底的推論過程中使用矩陣低秩近似法可以有效地將高維度的訊號進行了降維,並有效地減少了多層感知架構所需的輸入層(input layer)神經元(neuron)數量及隱藏層(hidden layer)的數量,藉此降低了分類系統的建置難度及成本。再者,透過將第二子系統用於分類演算法的引數同步回傳給第一子系統,更強化了第一子系統的基底生成單元在生成基底的推論(inference)過程,讓第一子系統的多層感知單元可以生成最佳基底,進而提升第二子系統對於量測得的資料訊號進行訊號前處理的效能。此外,由於第二子系統的分類演算可獨立於訊號前處理之外來進行,因而讓分類演算法的運算機制的調整更為靈活有彈性。In the self-learning data classification system and method proposed by the present invention, since the second subsystem performs signal pre-processing on the measured data signals, the base required by the first subsystem is generated by the first subsystem, thus completely eliminating the artificial The possible deviation of the method selection not only improves the efficiency of signal pre-processing but also improves the accuracy of signal classification, and ensures that the classification results have a high and effective classification recognition rate. In addition, the first subsystem uses a multi-layer perception architecture to generate the base, so the base generation process is completely self-learning, and by using the matrix low-rank approximation method in the inference process of generating the base, high-dimensional signals can be effectively carried out It reduces dimensionality and effectively reduces the number of input layer neurons and hidden layers required by the multi-layer perception architecture, thereby reducing the difficulty and cost of building a classification system. Furthermore, by synchronously transmitting the parameters of the second subsystem used in the classification algorithm to the first subsystem, the inference process of the base generation unit of the first subsystem in generating the base is further strengthened, allowing the first The multi-layer sensing unit of the subsystem can generate the optimal base, thereby enhancing the performance of the second subsystem in signal pre-processing of the measured data signal. In addition, since the classification algorithm of the second subsystem can be performed independently of the signal pre-processing, the adjustment of the calculation mechanism of the classification algorithm is more flexible and flexible.

為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more obvious and understandable, the embodiments are specifically described below in conjunction with the accompanying drawings for detailed description as follows.

本發明揭示一種自學式資料分類系統及方法,所涉及的資料訊號前處理及類神經網路架構中的多層感知器的基本原理,已為本領域普通技術人員所能明瞭,故以下文中的說明,不再作完整描述。同時,以下文中所對照的附圖,意在表達與本發明特徵有關的含義,並未亦不需要依據實際尺寸完整繪製,在先聲明。The present invention discloses a self-learning data classification system and method. The basic principles of the data signal pre-processing and the multi-layer perceptron in the neural network-like architecture are already understood by those of ordinary skill in the art, so the following description , No longer complete description. At the same time, the drawings referred to in the following are intended to express the meanings related to the features of the present invention, and they are not and need not be completely drawn according to the actual size, which is stated in advance.

圖1是依據本發明一實施例所提供的自學式資料分類系統10的系統架構示意圖。本實施例中,自學式資料分類系統10具有一第一子系統100及一第二子系統200,第二子系統200用於量測待測資料訊號或稱待分類資料訊號並利用分類演算法(classification algorithm)得出待測資料訊號的分類結果,而第一子系統100則用於生成第二子系統200對於量測得的待測資料訊號進行訊號前處理時所需的基底,讓第二子系統200藉此得到該基底的對應引數,並將引數用於後續的分類演算。第一子系統100通常遠離第二子系統200設置,但第一子系統100及第二子系統200間可透過網路彼此連結而能夠進行第一子系統100所生成基底及第二子系統200所生成引數的傳送及接收。第一系統100例如是一伺服器端(server)系統,而第二系統200例如是一客戶端(client)系統。其他實施例中,第一子系統100與第二子系統200可以電性連接在一起而不是通過網路溝通,讓基底及引數的生成在同一個硬體架構下實現。FIG. 1 is a schematic diagram of a system architecture of a self-learning data classification system 10 according to an embodiment of the present invention. In this embodiment, the self-learning data classification system 10 has a first subsystem 100 and a second subsystem 200. The second subsystem 200 is used to measure the data signal to be measured or the data signal to be classified and use a classification algorithm (classification algorithm) to obtain the classification result of the data signal to be tested, and the first subsystem 100 is used to generate the base required by the second subsystem 200 for signal pre-processing of the measured data signal to be tested, so that The second subsystem 200 thereby obtains the corresponding parameters of the base, and uses the parameters for subsequent classification calculations. The first sub-system 100 is usually located far away from the second sub-system 200, but the first sub-system 100 and the second sub-system 200 can be connected to each other through a network to enable the substrate generated by the first sub-system 100 and the second sub-system 200 Transmission and reception of the generated parameters. The first system 100 is, for example, a server system, and the second system 200 is, for example, a client system. In other embodiments, the first sub-system 100 and the second sub-system 200 may be electrically connected instead of communicating through the network, so that the generation of the base and the parameters is implemented under the same hardware architecture.

如圖1所示,一實施例中,第一子系統100具有彼此電性連接的一資料庫101、一基底生成單元102、一基底輸出單元103、一回傳引數接收單元104及一回傳引數評價單元105。資料庫101是一個儲存有一群的訓練資料的資料庫,儲存例如以記憶體來實現,所儲存的訓練資料的資料訊號與第二子系統200所量測的待測資料訊號具有相同屬性,例如都是載有生命徵象的生理訊號。在第一子系統100的運作過程中,基底生成單元102用以依據資料庫的訓練資料生成一組代表這些訓練資料的資料訊號的主成分的基底,並透過基底輸出單元103以線上更新的方式將生成的基底傳送至第二子系統200。每一組基底的基底個數至少為一。As shown in FIG. 1, in one embodiment, the first subsystem 100 has a database 101, a substrate generating unit 102, a substrate output unit 103, a loopback parameter receiving unit 104 and a loop electrically connected to each other Passing number evaluation unit 105. The database 101 is a database that stores a group of training data. The storage is implemented by, for example, memory. The data signal of the stored training data has the same attributes as the data signal to be measured measured by the second subsystem 200, for example All are physiological signals with vital signs. During the operation of the first subsystem 100, the base generation unit 102 is used to generate a set of bases representing the main components of the data signals of the training data according to the training data of the database, and the base output unit 103 is updated online The generated substrate is transferred to the second subsystem 200. The number of substrates in each group of substrates is at least one.

圖2是依據本發明一實施例所提供的自學式資料分類系統10的第一子系統100的基底生成單元102的功能結構示意圖。圖3是基底生成單元102依據訓練資料中的已知資料訊號q生成一組基底的推論過程示意圖。如圖2所示,一實施例中,基底生成單元102具有彼此電性連接的一矩陣化處理單元1021、一特徵分解單元1022、一矩陣低秩近似處理單元1023及一多層感知單元1024。矩陣化處理單元1021用以將訓練資料中的至少部份資料訊號例如是高維度資料訊號q轉化成矩陣A。特徵分解單元1021用以對矩陣A實施一奇異值分解(SVD)後獲得奇異值Σ與奇異向量V T,其中A≒UΣV T,U及V均為正交歸一矩陣(orthonormal matrix),即矩陣中包含的各向量在高維空間中互相垂直且長度為1。如此一來,矩陣A可被視為資料訊號q在V座標系上的投影,且各座標值依據Σ的對角線上的數值做倍數縮放後,於U的座標系上重組。如圖2及3所示,矩陣低秩近似處理單元1023用以選取依大小順序排列後的前幾個奇異值Σ'以及對應的奇異向量V' T,並據以計算出矩陣A的最近似低秩矩陣A',進而計算出最近似低秩矩陣A'對應的降維資料訊號q'。如圖2及3所示,多層感知單元1024接收每個經過降維的資料訊號q',依據類神經網路架構中的多層感知器(multilayer perceptrons)原理,輸出一組代表這些資料訊號q的主成分的基底。由於投影後的座標系已代表已知資料訊號q中各自線性獨立且依照特徵值大小排序的主成分,亦即原資料訊號中較為相關的資訊均已群聚於各座標軸,因此可以有效降低多層感知單元1024所需的輸入層10241神經元數量及隱藏層10242的數量。 2 is a schematic diagram of the functional structure of the base generation unit 102 of the first subsystem 100 of the self-learning data classification system 10 according to an embodiment of the present invention. FIG. 3 is a schematic diagram of the inference process of the base generation unit 102 generating a set of bases according to the known data signal q in the training data. As shown in FIG. 2, in an embodiment, the base generation unit 102 has a matrix processing unit 1021, a feature decomposition unit 1022, a matrix low rank approximation processing unit 1023 and a multi-layer perception unit 1024 electrically connected to each other. The matrix processing unit 1021 is used to convert at least part of the data signal in the training data, for example, the high-dimensional data signal q into the matrix A. The feature decomposition unit 1021 is used to obtain a singular value Σ and a singular vector V T after performing a singular value decomposition (SVD) on the matrix A, where A≒UΣV T , U and V are orthogonal normalized matrices (orthonormal matrix), namely The vectors contained in the matrix are perpendicular to each other and have a length of 1 in a high-dimensional space. In this way, the matrix A can be regarded as the projection of the data signal q on the V coordinate system, and each coordinate value is multiplied by the value on the diagonal of Σ, and then reorganized on the U coordinate system. As shown in FIGS. 2 and 3, the matrix low-rank approximation processing unit 1023 is used to select the first few singular values Σ′ and the corresponding singular vector V′ T arranged in order of magnitude, and calculate the most approximate matrix A according to Low-rank matrix A', and then calculate the dimensionality reduction data signal q'corresponding to the low-rank matrix A'. As shown in FIGS. 2 and 3, the multi-layer perception unit 1024 receives each dimensionality-reduced data signal q′, and outputs a set of data signals q representing these data signals according to the principle of multi-layer perceptrons in a neural network-like architecture. The base of the main component. Since the projected coordinate system already represents the main components of the known data signal q that are linearly independent and sorted according to the size of the feature value, that is, the more relevant information in the original data signal has been clustered on each coordinate axis, it can effectively reduce the multi-layer The number of neurons in the input layer 10241 and the number of hidden layers 10242 required by the sensing unit 1024.

請再度參閱圖1,回傳引數接收單元104用以接收第二子系統200生成並回傳的引數,回傳引數評價單元105用以對回傳的引數進行分次評價以判斷回傳的引數是否高於一設定閾值。所謂的分次評價是指每經過一設定時間段,例如10秒,才對回傳的引數進行評價而不是時時刻刻對回傳引數進行評價。當回傳的引數經分次評價後的判斷結果為低於設定閾值時繼續分次評價,而當回傳的引數經分次評價後的判斷結果為高於設定閾值時則基底生成單元102依據回傳的引數及資料庫中的訓練資料的資料訊號生成不同於原基底的新的一組代表這些資料訊號的主成分的基底,基底的個數至少為一。在第一子系統100的運作過程中,當第二子系統200所生成的引數回傳時,第一子系統100可視情況以線上更新的方式修正第二子系統200運作過程中所使用的基底,讓第二子系統200能以新的一組基底進行量測得的待測訊號的分類。Please refer to FIG. 1 again, the return parameter receiving unit 104 is used to receive the parameters generated and returned by the second subsystem 200, and the return parameter evaluation unit 105 is used to evaluate the returned parameters in stages to determine Whether the returned parameter is higher than a set threshold. The so-called fractional evaluation refers to the evaluation of the return parameters only after a set period of time, such as 10 seconds, instead of evaluating the return parameters every moment. When the judgment result of the returned parameter after fractional evaluation is lower than the set threshold, the continuous evaluation is continued, and when the judgment result of the returned parameter after fractional evaluation is higher than the set threshold, the base generation unit 102 Generate a new set of bases representing the main components of these data signals based on the returned parameters and the data signals of the training data in the database. The number of bases is at least one. During the operation of the first sub-system 100, when the parameters generated by the second sub-system 200 are returned, the first sub-system 100 may modify the parameters used in the operation of the second sub-system 200 in an online update mode as appropriate. The base allows the second subsystem 200 to classify the signals to be measured with a new set of bases.

另一方面,如圖1所示,一實施例中,第二子系統200具有彼此電性連接的資料訊號量測單元201、基底輸入單元202、引數生成單元203、引數回傳單元204及引數分類計算單元205。資料訊號量測單元201用以量測並接收經由第二子系統200的輸入端211輸入第二子系統200的待測資料訊號210並據以儲存,例如以暫存記憶體來實現,待測資料訊號210與第一子系統100的資料庫101中所儲存的訓練資料的資料訊號具有相同屬性,例如是生理訊號。基底輸入單元202用以接收第一子系統100的基底輸出單元103傳送來的基底。引數生成單元203依據基底輸入單元202所接收到的基底對資料訊號量測單元201所接收到的待測資料訊號210進行引數計算,進而生成對應於這些基底的引數。引數回傳單元204將生成的引數回傳給第一子系統100,並由回傳引數接收單元104接收。引數分類計算單元205用以透過一分類演算法依據生成的引數計算出待測資料訊號210的分類結果220,分類演算法通常以電腦程式的執行來實現。分類結果220最後經由第二子系統200的輸出端212輸出。在第二子系統200從量測得待測資料訊號210至進行分類演算法的過程中,生成的引數可回傳給第一子系統100,讓第一子系統100可以依據回傳的引數強化其生成最佳基底的能力。On the other hand, as shown in FIG. 1, in an embodiment, the second subsystem 200 has a data signal measurement unit 201, a base input unit 202, a parameter generation unit 203, and a parameter return unit 204 that are electrically connected to each other And argument classification calculation unit 205. The data signal measuring unit 201 is used to measure and receive the data signal 210 to be tested input to the second subsystem 200 through the input terminal 211 of the second subsystem 200 and store it accordingly, for example, by a temporary memory, to be tested The data signal 210 has the same attributes as the data signal of the training data stored in the database 101 of the first subsystem 100, such as a physiological signal. The substrate input unit 202 is used to receive the substrate transferred from the substrate output unit 103 of the first subsystem 100. The argument generation unit 203 performs argument calculation on the data signal to be tested 210 received by the data signal measurement unit 201 according to the basis received by the basis input unit 202, and then generates arguments corresponding to these basis. The parameter return unit 204 returns the generated parameter to the first subsystem 100 and is received by the return parameter receiving unit 104. The parameter classification calculation unit 205 is used to calculate the classification result 220 of the data signal to be tested 210 according to the generated parameters through a classification algorithm. The classification algorithm is usually implemented by the execution of a computer program. The classification result 220 is finally output via the output terminal 212 of the second subsystem 200. During the process from the measurement of the data signal to be measured 210 to the classification algorithm, the generated parameters of the second subsystem 200 can be transmitted back to the first subsystem 100, so that the first subsystem 100 can Number strengthens its ability to generate the best substrate.

圖4是依據本發明一實施例所提供的自學式資料分類系統10的第一子系統100於整個自學式資料分類系統10進行資料分類時的實施步驟流程圖。本實施例中,自學式資料分類系統10的資料分類方法具有下列步驟:FIG. 4 is a flow chart of implementation steps of the first subsystem 100 of the self-learning data classification system 10 according to an embodiment of the present invention when performing data classification in the entire self-learning data classification system 10. In this embodiment, the data classification method of the self-learning data classification system 10 has the following steps:

步驟601:接收回傳引數。如圖1所示,自學式資料分類系統10的第一子系統100的回傳引數接收單元104接收來自第二子系統200的引數回傳單元204所回傳的引數。Step 601: Receive the return parameter. As shown in FIG. 1, the return argument receiving unit 104 of the first subsystem 100 of the self-learning data classification system 10 receives the argument returned by the argument return unit 204 of the second subsystem 200.

步驟602:判斷是否有回傳引數。如圖1所示,自學式資料分類系統10的第一子系統100的回傳引數評價單元105判斷是否有來自第二子系統200的回傳引數。否的話,執行步驟603;是的話,執行步驟604。Step 602: Determine whether there are return parameters. As shown in FIG. 1, the loopback parameter evaluation unit 105 of the first subsystem 100 of the self-learning data classification system 10 determines whether there are loopback parameters from the second subsystem 200. If no, go to step 603; if yes, go to step 604.

步驟603:依據資料庫中的訓練資料的資料訊號生成一第一組代表訓練資料的資料訊號的主成份的基底。如圖1至3所示,第一子系統100的基底生成單元102依據資料庫101中的訓練資料生成一第一組代表訓練資料的資料訊號的主成份的基底。接著,執行步驟607。Step 603: According to the data signal of the training data in the database, generate a first set of bases representing the main components of the data signal of the training data. As shown in FIGS. 1 to 3, the base generation unit 102 of the first subsystem 100 generates a base of the first component representing the main components of the data signal of the training data according to the training data in the database 101. Then, step 607 is executed.

步驟604:分次評價回傳引數。如圖1所示,第一子系統100的回傳引數評價單元105每經過一設定時間段,例如10秒,對回傳的引數進行評價,接著執行步驟605。Step 604: Evaluate the return parameters in stages. As shown in FIG. 1, the feedback parameter evaluation unit 105 of the first subsystem 100 evaluates the returned parameters every time a set period of time, for example, 10 seconds, and then executes step 605.

步驟605:判斷回傳的引數是否高於設定閾值。如圖1所示,第一子系統100的回傳引數評價單元105判斷回傳的引數是否高於一設定閾值。否的話,回到步驟604,是的話,則執行步驟606。Step 605: Determine whether the returned parameter is higher than the set threshold. As shown in FIG. 1, the loopback parameter evaluation unit 105 of the first subsystem 100 determines whether the loopback parameters are higher than a set threshold. If not, go back to step 604. If yes, go to step 606.

步驟606: 依據回傳的引數及資料庫中的訓練資料的資料訊號生成一第二組代表訓練資料的資料訊號的主成份的基底。第二組基底不同於第一組基底,第二組基底的基底個數至少為一。如圖1至3所示,第一子系統100的基底生成單元102依據回傳的引數及資料庫101中的訓練資料的資料訊號生成新的一組代表訓練資料的資料訊號的主成份的基底。接著,執行步驟607。Step 606: Generate a second set of bases representing the main components of the data signal of the training data based on the returned parameters and the data signal of the training data in the database. The second group of substrates is different from the first group of substrates, and the number of substrates of the second group of substrates is at least one. As shown in FIGS. 1 to 3, the base generation unit 102 of the first subsystem 100 generates a new set of main components representing the data signal of the training data based on the returned parameters and the data signal of the training data in the database 101 Base. Then, step 607 is executed.

步驟607: 輸出基底。如圖1所示,在基底生成單元102生成基底後,由基底輸出單元103輸出基底至第二子系統200。Step 607: Output the substrate. As shown in FIG. 1, after the substrate generating unit 102 generates the substrate, the substrate output unit 103 outputs the substrate to the second subsystem 200.

圖5是依據本發明一實施例所提供的自學式資料分類系統10的第二子系統200於整個自學式資料分類系統10進行資料分類時的實施步驟流程圖。本實施例中,自學式資料分類系統10的資料分類方法具有下列步驟:FIG. 5 is a flow chart of implementation steps of the second subsystem 200 of the self-learning data classification system 10 according to an embodiment of the present invention when performing data classification in the entire self-learning data classification system 10. In this embodiment, the data classification method of the self-learning data classification system 10 has the following steps:

步驟701:量測得待測資料訊號。如圖1所示,自學式資料分類系統10的第二子系統200的資料訊號量測單元201量測並接收經由第二子系統200的輸入端211輸入第二子系統200的待測資料訊號210。接著,執行步驟702。Step 701: Measure the data signal to be measured. As shown in FIG. 1, the data signal measurement unit 201 of the second subsystem 200 of the self-learning data classification system 10 measures and receives the data signal to be tested input to the second subsystem 200 through the input terminal 211 of the second subsystem 200 210. Then, step 702 is executed.

步驟702:以輸入基底生成引數。如圖1所示,自學式資料分類系統10的第二子系統200的基底輸入單元202接收了第一子系統100傳送過來的第一組基底及第二組基底,並由引數生成單元203依據基底輸入單元202所接收到的第一組基底及第二組基底其中之一對所接收到的待測資料訊號210進行引數計算,進而生成對應於第一組基底或第二組基底的引數。接著,執行步驟703及步驟705。Step 702: Generate parameters with the input base. As shown in FIG. 1, the base input unit 202 of the second subsystem 200 of the self-learning data classification system 10 receives the first set of bases and the second set of bases transmitted from the first subsystem 100, and the parameter generating unit 203 According to one of the first group of substrates and the second group of substrates received by the substrate input unit 202, the received data signal to be tested 210 is calculated in parameters, and then the corresponding to the first group of substrates or the second group of substrates is generated Arguments. Then, step 703 and step 705 are executed.

步驟703:依據生成的引數計算出待測資料訊號的分類結果。如圖1所示,引數分類計算單元205依據生成的引數透過一分類演算法計算出待測資料訊號210的分類結果220。接著,執行步驟704。Step 703: Calculate the classification result of the data signal to be tested according to the generated parameters. As shown in FIG. 1, the parameter classification calculation unit 205 calculates the classification result 220 of the data signal to be tested 210 through a classification algorithm according to the generated parameters. Then, step 704 is executed.

步驟704:輸出分類結果。如圖1所示,分類結果220最後經由第二子系統200的輸出端212輸出。Step 704: Output the classification result. As shown in FIG. 1, the classification result 220 is finally output via the output terminal 212 of the second subsystem 200.

步驟705:回傳引數。如圖1所示,第二子系統200的引數回傳單元204將生成的引數回傳給第一子系統100的基底生成單元102參考。Step 705: Return the argument. As shown in FIG. 1, the argument return unit 204 of the second subsystem 200 returns the generated argument to the base generation unit 102 of the first subsystem 100 for reference.

圖6是依據本發明另一實施例所提供的自學式資料分類方法的實施步驟流程圖。本實施例中,自學式資料分類方法具有下列步驟:6 is a flowchart of implementation steps of a self-learning data classification method according to another embodiment of the present invention. In this embodiment, the self-learning data classification method has the following steps:

步驟801:依據一群的訓練資料生成代表該些訓練資料的資料訊號的主成分的一第一組基底,訓練資料的資料訊號與待測資料訊號的屬性相同,第一組基底的基底個數至少為一。接著,執行步驟802。Step 801: Generate a first set of bases representing the principal components of the data signals of the training data according to a group of training data. The data signal of the training data has the same attributes as the data signal to be tested, and the number of bases of the first set of bases is at least For one. Then, step 802 is performed.

步驟802:依據一第一組待測資料訊號及第一組基底生成對應於第一組基底的第一組引數。接著,執行步驟803及步驟806。Step 802: Generate a first set of parameters corresponding to the first set of bases based on a first set of data signals to be tested and the first set of bases. Next, step 803 and step 806 are performed.

步驟803:判斷第一組引數是否高於一設定閾值。接著,執行步驟804。Step 803: Determine whether the first set of parameters is higher than a set threshold. Then, step 804 is performed.

步驟804:當第一組引數高於一設定閾值時,依據第一組引數及該些訓練資料的資料訊號生成一第二組基底,第二組基底的基底個數至少為一。接著,執行步驟805。Step 804: When the first set of parameters is higher than a set threshold, generate a second set of bases based on the first set of parameters and the data signals of the training data, and the number of bases in the second set of bases is at least one. Then, step 805 is executed.

步驟805:依據第二組基底及一第二組待測資料訊號生成對應於第二組基底的第二組引數。接著,執行步驟806。Step 805: Generate a second set of parameters corresponding to the second set of bases based on the second set of bases and a second set of data signals to be tested. Then, step 806 is executed.

步驟806: 依據第一組引數計算出第一組待測資料訊號的分類結果或依據第二組引數計算出第二組待測資料訊號的分類結果。Step 806: Calculate the classification result of the first set of data signals to be tested according to the first set of parameters or calculate the classification result of the second set of data signals to be tested according to the second set of parameters.

一實施例中,上述步驟801、803、804可以是在如圖1所示的一第一子系統100上執行,而上述步驟802、805及806可以是在如圖1所示的一第二子系統200上執行,第二子系統200遠離第一子系統100且受第一子系統100控制。In an embodiment, the above steps 801, 803, and 804 may be performed on a first subsystem 100 as shown in FIG. 1, and the above steps 802, 805, and 806 may be performed on a second as shown in FIG. Executed on the subsystem 200, the second subsystem 200 is far away from and controlled by the first subsystem 100.

在本發明所提出的自學式資料分類系統及方法中,由於第二子系統200對於量測得的資料訊號進行訊號前處理時所需的基底是由第一子系統100生成的,因而完全消除了人為方式選取所可能產生的偏差,不僅提升了訊號前處理的效率也提升了訊號分類的準確度,並確保了分類結果具有高度且有效的分類辨識率。此外,第一子系統100採用了多層感知架構來生成基底,因而基底生成的過程完全是自學式的,而透過在生成基底的推論過程中使用矩陣低秩近似法有效地將高維度的訊號進行了降維,並有效地減少了多層感知架構所需的輸入層神經元數量及隱藏層的數量,藉此降低了分類系統的建置難度及成本。再者,透過將第二子系統200用於分類演算法的引數同步回傳給第一子系統100,更強化了第一子系統100的基底生成單元102在生成基底的推論過程,讓第一子系統100的多層感知單元1024可以生成最佳基底,進而提升第二子系統200對於量測得的資料訊號進行訊號前處理的效能。此外,由於第二子系統的分類演算可獨立於訊號前處理之外來進行,因而讓分類演算法的運算機制的調整更為靈活有彈性。In the self-learning data classification system and method proposed by the present invention, since the base required by the second subsystem 200 for signal preprocessing of the measured data signal is generated by the first subsystem 100, it is completely eliminated The possible deviations caused by manual selection not only improve the efficiency of signal pre-processing but also improve the accuracy of signal classification, and ensure that the classification results have a high and effective classification recognition rate. In addition, the first subsystem 100 uses a multi-layer perception architecture to generate the base, so the base generation process is completely self-learning, and the high-dimensional signals are effectively carried out by using the matrix low-rank approximation method during the inference process of generating the base It reduces dimensionality, and effectively reduces the number of input layer neurons and the number of hidden layers required by the multi-layer perception architecture, thereby reducing the difficulty and cost of building a classification system. Furthermore, by synchronously transmitting the parameters of the second subsystem 200 used in the classification algorithm to the first subsystem 100, the inference process of the base generation unit 102 of the first subsystem 100 in generating the base is further strengthened, allowing the first The multi-layer sensing unit 1024 of a sub-system 100 can generate an optimal base, thereby improving the performance of the second sub-system 200 in performing signal pre-processing on the measured data signal. In addition, since the classification algorithm of the second subsystem can be performed independently of the signal pre-processing, the adjustment of the calculation mechanism of the classification algorithm is more flexible and flexible.

在應用上,當所提出的自學式資料分類系統10的第二子系統200所量測的待測資料訊號是包含生命徵象(vital sign)的生理訊號時,資料訊號量測單元201可以是一個生命徵象量測單元,用以量測人體的體溫、脈博、呼吸及血壓。此時,第二子系統200可以是一個手持的生理訊號檢測器,而第一子系統100可以是一個遠端控制手持生理訊號量測器的雲端伺服器,手持的生理訊號檢測器與雲端伺服器間以有線或無線的網路連接。由於各種生理訊號的資料有其範圍,因而經降維後所獲得的基底和引數之間的關係可被確定下來,更適於應用本案所提出的自學式資料分類系統。換言之,任何經降維後所獲得的基底和引數間的關係具有可確定性的資料訊號,都適於應用本發明所提出的自學式資料分類系統及方法進行分類。In terms of application, when the data signal to be measured measured by the second subsystem 200 of the proposed self-learning data classification system 10 is a physiological signal including vital signs, the data signal measurement unit 201 may be a Vital sign measurement unit is used to measure body temperature, pulse, respiration and blood pressure of the human body. At this time, the second subsystem 200 may be a handheld physiological signal detector, and the first subsystem 100 may be a cloud server that remotely controls the handheld physiological signal measurer, the handheld physiological signal detector and the cloud server The devices are connected by wired or wireless network. Because the data of various physiological signals has its range, the relationship between the base and the parameters obtained after dimensionality reduction can be determined, which is more suitable for applying the self-learning data classification system proposed in this case. In other words, any data signal with a deterministic relationship between the basis and parameters obtained after dimensionality reduction is suitable for classification using the self-learning data classification system and method proposed by the present invention.

本發明的各種實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,例如將第一子系統及第二子系統整合在一起。故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Various embodiments of the present invention are disclosed as above, but they are not intended to limit the present invention. Any person with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. For example, The first subsystem and the second subsystem are integrated together. Therefore, the protection scope of the present invention shall be deemed as defined by the appended patent application scope.

10                       自學式資料分類系統 100                   第一子系統 101                   資料庫 102                   基底生成單元 1021               矩陣化處理單元 1022               特徵分解單元 1023               矩陣低秩近似處理單元 1024               多層感知單元 10241           輸入層 10242           隱藏層 103                   基底輸出單元 104                   回傳引數接收單元 105                   回傳引數評價單元 200                   第二子系統 201                   資料訊號量測單元 202                   基底輸入單元 203                   引數生成單元 204                   引數回傳單元 205                   引數分類計算單元 210                   待測資料訊號 211                   輸入端 212                   輸出端 220                   分類結果 601~607   步驟 701~705   步驟 801~806   步驟10 self-learning unit 1022 wherein data classification system 100 of the first substrate 102 Database subsystem 101 generates a matrix processing unit 1021 matrix decomposition unit 1023 low rank approximation processing unit 1024 multilayer perceptron output unit 104 back to the base 103 of the hidden layer unit input layer 10242 10241 receiving unit 105 pass arguments return arguments evaluation unit 200 of the second sub-201203 argument information signal measuring unit 202, a base number generation unit 204 input unit primers return arguments classification unit 205 calculating unit 210 data signals to be measured 211 Input terminal 212 Output terminal 220 Classification result 601~607 Steps 701~705 Steps 801~806 Steps

圖1是依據本發明一實施例的一種自學式資料分類系統的系統架構示意圖。 圖2是依據本發明一實施例的一種自學式資料分類系統的第一子系統的基底生成單元的功能結構示意圖。 圖3是依據本發明一實施例的一種自學式資料分類系統的第一子系統的基底生成單元依據訓練資料中的已知資料訊號生成基底的推論過程示意圖。 圖4是依據本發明一實施例的一種自學式資料分類系統的第一子系統於整個自學式資料分類系統進行資料分類時的實施步驟流程圖。 圖5是依據本發明一實施例的一種自學式資料分類系統的第二子系統於整個自學式資料分類系統進行資料分類時的實施步驟流程圖。 圖6是依據本發明另一實施例的一種自學式資料分類方法的實施步驟流程圖。 圖7是已知將高維度訊號降維成特定基底與其對應引數的組合的示意圖。FIG. 1 is a schematic diagram of a system architecture of a self-learning data classification system according to an embodiment of the invention. 2 is a schematic diagram of a functional structure of a base generation unit of a first subsystem of a self-learning data classification system according to an embodiment of the invention. FIG. 3 is a schematic diagram of an inference process of a base generation unit of a first subsystem of a self-learning data classification system according to an embodiment of the present invention to generate a base based on known data signals in training data. FIG. 4 is a flowchart of implementation steps of a first subsystem of a self-learning data classification system according to an embodiment of the present invention when data classification is performed by the entire self-learning data classification system. FIG. 5 is a flowchart of implementation steps of a second subsystem of a self-learning data classification system according to an embodiment of the present invention when data classification is performed by the entire self-learning data classification system. 6 is a flowchart of implementation steps of a self-learning data classification method according to another embodiment of the present invention. FIG. 7 is a schematic diagram of known high-dimensional signal dimensionality reduction into a combination of a specific base and its corresponding parameter.

10                          自學式資料分類系統 100                      第一子系統 101                      資料庫 102                      基底生成單元 103                      基底輸出單元 104                      回傳引數接收單元 105                      回傳引數評價單元 200                      第二子系統 201                      資料訊號量測單元 202                      基底輸入單元 203                      引數生成單元 204                      引數回傳單元 205                      引數分類計算單元 210                      待測資料訊號 211                      輸入端 212                      輸出端 220                      分類結果10 self-learning classification system information database 100 a first subsystem 101 base generation unit 102 output unit 104 of the substrate 103 return arguments receiving unit 105 return arguments evaluation unit 200 second signal measuring subsystem 201 data input unit 202 of the substrate cell number generation unit 203 arguments lead 204 return arguments classification unit 205 calculating unit 210 measured data signals 211 input terminal 220 output the classification result 212

Claims (10)

一種自學式資料分類系統,用於對一待測資料訊號進行分類,包括:一第一子系統,具有:一資料庫,儲存有一群的訓練資料,該些訓練資料的資料訊號與該待測資料訊號具有相同屬性;一基底生成單元,與該資料庫電性連接,用以將該些訓練資料的至少部分資料訊號轉化成一矩陣及計算出該矩陣的一最近似低秩矩陣以生成一第一組代表該些訓練資料的資料訊號的主成分的基底,該第一組基底的基底個數至少為一;及一回傳引數接收單元,與該基底生成單元電性連接;以及一第二子系統,與該第一子系統電性連接或網路連接,具有:一資料訊號量測單元,用以量測該待測資料訊號;一引數生成單元,與該資料訊號量測單元電性連接,用以依據該第一組基底及該待測資料訊號生成對應於該第一組基底的引數;一引數回傳單元,與該引數生成單元電性連接,用以將該引數回傳至該第一子系統;及一引數分類計算單元,與該引數生成單元電性連接,用以依據該引數計算出該待測資料訊號的分類結果;其中,該回傳引數接收單元接收該引數回傳單元回傳的該引數,且該基底生成單元依據回傳的該引數及該些訓練資料的資料訊號生成不同於該第一組基底的一第二組基底,該第二組基底的基底個數至少為一。 A self-learning data classification system for classifying a data signal to be tested includes: a first subsystem with: a database storing a group of training data, the data signals of the training data and the test to be tested The data signals have the same attributes; a base generation unit, electrically connected to the database, is used to convert at least part of the data signals of the training data into a matrix and calculate a most approximate low rank matrix of the matrix to generate a first A group of bases representing the main components of the data signals of the training data, the number of bases of the first group of bases is at least one; and a return parameter receiving unit electrically connected to the base generating unit; and a first Two subsystems, electrically connected to the first subsystem or network connection, have: a data signal measuring unit for measuring the data signal to be tested; an argument generating unit and the data signal measuring unit The electrical connection is used to generate a parameter corresponding to the first group of substrates according to the first group of substrates and the data signal to be tested; a parameter return unit is electrically connected to the parameter generation unit to connect The parameter is returned to the first subsystem; and a parameter classification calculation unit electrically connected to the parameter generation unit to calculate the classification result of the data signal to be tested according to the parameter; wherein, the The return parameter receiving unit receives the parameter returned by the parameter return unit, and the base generating unit generates a different one from the first set of bases according to the returned parameter and the data signals of the training data The second group of substrates has at least one substrate. 根據申請專利範圍第1項所述的自學式資料分類系統,其中該基底生成單元具有: 一矩陣化處理單元,與該資料庫電性連接,其將該些訓練資料的該些部分資料訊號轉化成該矩陣;一特徵分解單元,與該矩陣化處理單元電性連接,其對該矩陣實施一奇異值分解後獲得一奇異值及一對應的奇異向量;一矩陣低秩近似處理單元,與該特徵分解單元電性連接,其依據該奇異值及該奇異向量計算出該矩陣的該最近似低秩矩陣;及一多層感知單元,與該矩陣低秩近似處理單元電性連接,其接收該最近似低秩矩陣對應的資料訊號並輸出該第一組基底及該第二組基底其中之一。 According to the self-learning data classification system described in item 1 of the patent application scope, wherein the base generation unit has: A matrixing processing unit, electrically connected to the database, which converts the partial data signals of the training data into the matrix; a feature decomposition unit, electrically connected to the matrixing processing unit, which connects the matrix After performing a singular value decomposition, a singular value and a corresponding singular vector are obtained; a matrix low-rank approximation processing unit is electrically connected to the feature decomposition unit, and calculates the nearest of the matrix according to the singular value and the singular vector A low-rank-like matrix; and a multi-layer sensing unit electrically connected to the matrix low-rank approximation processing unit, which receives the data signal corresponding to the most approximate low-rank matrix and outputs the first set of bases and the second set of bases one. 根據申請專利範圍第1項所述的自學式資料分類系統,其中該第一子系統更具有:一回傳引數評價單元,與該基底生成單元及該回傳引數接收單元電性連接,用以對回傳的該引數進行分次評價以判斷回傳的該引數是否高於一設定閾值。 According to the self-learning data classification system described in item 1 of the patent application scope, wherein the first subsystem further includes: a loopback parameter evaluation unit, which is electrically connected to the base generation unit and the loopback parameter receiving unit, It is used to evaluate the returned parameter in stages to determine whether the returned parameter is higher than a set threshold. 根據申請專利範圍第3項所述的自學式資料分類系統,其中該第一子系統更具有:一基底輸出單元,與該基底生成單元電性連接,用以將該基底生成單元生成的該第一組基底及該第二組基底通過網路傳送至該第二子系統。 According to the self-learning data classification system described in item 3 of the patent application scope, wherein the first sub-system further includes: a substrate output unit electrically connected to the substrate generation unit for the first generation of the substrate generation unit A group of substrates and the second group of substrates are transmitted to the second subsystem through the network. 根據申請專利範圍第4項所述的自學式資料分類系統,其中該第二子系統更具有:一基底輸入單元,與該資料訊號量測單元電性連接,用以接收該基底輸出單元傳送來的該第一組基底及該第二組基底。 According to the self-learning data classification system described in item 4 of the patent application scope, wherein the second subsystem further includes: a base input unit, electrically connected to the data signal measurement unit, for receiving the base output unit The first group of substrates and the second group of substrates. 根據申請專利範圍第1至5項其中任一項所述的自學式資料分類系統,其中該待測資料訊號包含生命徵象。 The self-learning data classification system according to any one of items 1 to 5 of the patent application scope, wherein the data signal to be tested contains vital signs. 一種根據申請專利範圍第1項所述的自學式資料分類系統的自學式資料分類方法,用於對該待測資料訊號進行分類,包括下列步驟:依據該群的訓練資料生成代表該些訓練資料的資料訊號的主成分的該第一組基底,該些訓練資料的資料訊號與該待測資料訊號的屬性相同,該第一組基底的基底個數至少為一;依據一第一組該待測資料訊號及該第一組基底生成對應於該第一組基底的一第一組引數;依據該第一組引數及該些訓練資料的資料訊號生成該第二組基底,該第二組基底的基底個數至少為一;及依據該第一組引數計算出該第一組該待測資料訊號的分類結果。 A self-learning data classification method according to the self-learning data classification system described in item 1 of the patent application scope, used to classify the data signal to be tested, includes the following steps: generating representative training data based on the training data of the group The first set of bases of the main component of the data signal of the training signal, the data signal of the training data has the same attributes as the data signal to be tested, and the number of bases of the first set of bases is at least one; The measured data signal and the first set of bases generate a first set of parameters corresponding to the first set of bases; the second set of bases are generated according to the first set of parameters and the data signals of the training data, the second The number of bases of the group base is at least one; and the classification result of the first set of the data signal to be tested is calculated according to the first set of parameters. 根據申請專利範圍第7項所述的自學式資料分類方法,其中該第一組基底及該第二組基底的生成步驟是在該第一子系統上執行,該第一組引數的生成及該第一組該待測資料訊號的分類結果的計算是在該第二子系統上執行,該第二子系統遠離該第一子系統且受該第一子系統控制。 According to the self-learning data classification method described in item 7 of the patent application scope, wherein the generating steps of the first set of bases and the second set of bases are performed on the first subsystem, the generation of the first set of arguments and The calculation of the classification result of the first set of data signals to be tested is performed on the second subsystem, which is far away from and controlled by the first subsystem. 根據申請專利範圍第7項所述的自學式資料分類方法,更包括下列步驟:判斷該第一組引數是否高於一設定閾值。 According to the self-learning data classification method described in item 7 of the patent application scope, the method further includes the following steps: determining whether the first set of parameters is higher than a set threshold. 根據申請專利範圍第7項所述的自學式資料分類方法,更包括下列步驟:依據該第二組基底及一第二組該待測資料訊號生成對應於該第二組基底的一第二組引數;及依據該第二組引數計算出該第二組該待測資料訊號的分類結果。 According to the self-learning data classification method described in item 7 of the patent application scope, the method further includes the following steps: generating a second group corresponding to the second group of substrates based on the second group of substrates and a second group of data signals to be tested Parameters; and calculate the classification result of the second set of the data signal to be tested according to the second set of parameters.
TW107116402A 2018-05-15 2018-05-15 Self-learning data classification system and method TWI682330B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
TW107116402A TWI682330B (en) 2018-05-15 2018-05-15 Self-learning data classification system and method
CN201910263690.2A CN110490216A (en) 2018-05-15 2019-04-03 A kind of self-study formula data sorting system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW107116402A TWI682330B (en) 2018-05-15 2018-05-15 Self-learning data classification system and method

Publications (2)

Publication Number Publication Date
TW201947465A TW201947465A (en) 2019-12-16
TWI682330B true TWI682330B (en) 2020-01-11

Family

ID=68545811

Family Applications (1)

Application Number Title Priority Date Filing Date
TW107116402A TWI682330B (en) 2018-05-15 2018-05-15 Self-learning data classification system and method

Country Status (2)

Country Link
CN (1) CN110490216A (en)
TW (1) TWI682330B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI808785B (en) * 2022-06-10 2023-07-11 英業達股份有限公司 Data splitting system and method for validating machine learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6993193B2 (en) * 2002-03-26 2006-01-31 Agilent Technologies, Inc. Method and system of object classification employing dimension reduction
US20140324739A1 (en) * 2010-06-09 2014-10-30 Heiko Claussen Systems and methods for learning of normal sensor signatures, condition monitoring and diagnosis
US20140343396A1 (en) * 2003-07-01 2014-11-20 Cardiomag Imaging, Inc. Use of Machine Learning for Classification of Magneto Cardiograms
CN104408476A (en) * 2014-12-08 2015-03-11 西安电子科技大学 Deep sparse main component analysis-based polarimetric SAR image classification method
CN105528516A (en) * 2015-12-01 2016-04-27 三门县人民医院 Clinic pathology data classification method based on combination of principal component analysis and extreme learning machine

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102317786A (en) * 2007-04-18 2012-01-11 特提斯生物科学公司 Diabetes correlativity biological marker and method of application thereof
KR101236040B1 (en) * 2011-04-06 2013-02-21 조선대학교산학협력단 Fingerprint verification apparatus and method therefor using PCA
CN103646252B (en) * 2013-12-05 2017-01-11 江苏大学 Optimized fuzzy learning vector quantization apple classification method
EP3332357A1 (en) * 2015-08-04 2018-06-13 Siemens Aktiengesellschaft Visual representation learning for brain tumor classification
CN107506787B (en) * 2017-07-27 2019-09-10 陕西师范大学 A kind of glue into concrete beam cracks classification method based on migration self study

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6993193B2 (en) * 2002-03-26 2006-01-31 Agilent Technologies, Inc. Method and system of object classification employing dimension reduction
US20140343396A1 (en) * 2003-07-01 2014-11-20 Cardiomag Imaging, Inc. Use of Machine Learning for Classification of Magneto Cardiograms
US20140324739A1 (en) * 2010-06-09 2014-10-30 Heiko Claussen Systems and methods for learning of normal sensor signatures, condition monitoring and diagnosis
CN104408476A (en) * 2014-12-08 2015-03-11 西安电子科技大学 Deep sparse main component analysis-based polarimetric SAR image classification method
CN105528516A (en) * 2015-12-01 2016-04-27 三门县人民医院 Clinic pathology data classification method based on combination of principal component analysis and extreme learning machine

Also Published As

Publication number Publication date
TW201947465A (en) 2019-12-16
CN110490216A (en) 2019-11-22

Similar Documents

Publication Publication Date Title
Reddi et al. Mlperf inference benchmark
Bargshady et al. Enhanced deep learning algorithm development to detect pain intensity from facial expression images
US11521716B2 (en) Computer-implemented detection and statistical analysis of errors by healthcare providers
US20200334809A1 (en) Computer-implemented machine learning for detection and statistical analysis of errors by healthcare providers
US20220164346A1 (en) Query-oriented approximate query processing based on machine learning techniques
CN114897102B (en) Industrial robot fault diagnosis method, system, device and storage medium
US9852378B2 (en) Information processing apparatus and information processing method to estimate cause-effect relationship between variables
Levashenko et al. Reliability estimation of healthcare systems using fuzzy decision trees
CN116894211A (en) System for generating human perceptible interpretive output, method and computer program for monitoring anomaly identification
HK1221541A1 (en) Method and device for detecting user quality
Lerch et al. Efficient quantum-enhanced classical simulation for patches of quantum landscapes
CN110291539A (en) Processing method, system, program and storage medium for generating learning data, and method and system for generating learning data
CN110766060A (en) Time series similarity calculation method, system and medium based on deep learning
Liu et al. An integrated framework for eye tracking-assisted task capability recognition of air traffic controllers with machine learning
Zhang et al. One step closer to unbiased aleatoric uncertainty estimation
TWI682330B (en) Self-learning data classification system and method
US11195056B2 (en) System improvement for deep neural networks
Nalci et al. Human action recognition with raw millimeter wave radar data
AlRababah Neural networks precision in technical vision systems
WO2021038840A1 (en) Object number estimation device, control method, and program
CN116880688A (en) Gesture recognition method and system based on multi-channel information fusion
CN115423186A (en) Cost prediction method, device, medium and equipment based on neural network model
Kim et al. DANDI: Diffusion as Normative Distribution for Deep Neural Network Input
Wang et al. Reference-based GAN Evaluation by Adaptive Inversion
JP7498688B2 (en) Model, device and method for estimating acceptability using relationship between change in target state and target state