TWI865935B

TWI865935B - Data processing method and apparatus

Info

Publication number: TWI865935B
Application number: TW111137595A
Authority: TW
Inventors: 何宇軒; 黃鈺文
Original assignee: 緯創資通股份有限公司
Priority date: 2022-06-16
Filing date: 2022-10-03
Publication date: 2024-12-11
Also published as: TWI849690B; TW202401306A; TW202400076A; CN117252268A

Abstract

Data predicting method and apparatus are provided. In the method, distances between a to-be-predicted data and multiple data groups are determined. A first machine learning model corresponding the data group having the shortest distance with the to-be-predicted data is selected from multiple machine learning model. The first machine learning model is used to predict the to-be-predicted data. Those machine learning models are trained by using different data groups, respectively. Accordingly, the predicted result of the module could be improved.

Description

Data prediction method and device

本發明是有關於一種資料預測技術，且特別是有關於一種用於機器學習(machine learning)的資料預測方法及裝置。 The present invention relates to a data prediction technology, and in particular to a data prediction method and device for machine learning.

機器學習演算法可透過分析大量資料以推論這些資料的規律，從而對未知資料進行預測。近年來，機器學習已廣泛應用在影像辨識、自然語言處理、結果預測、醫療診斷、錯誤偵測或語音辨識等領域。 Machine learning algorithms can analyze large amounts of data to infer the patterns of these data and make predictions about unknown data. In recent years, machine learning has been widely used in fields such as image recognition, natural language processing, outcome prediction, medical diagnosis, error detection, or speech recognition.

有鑑於此，本發明實施例提供一種資料預測方法及裝置，可分群預測資料，以提升預測準確度。 In view of this, the present invention provides a data prediction method and device that can group the predicted data to improve the prediction accuracy.

本發明實施例的資料預測方法適用於機器學習，且資料預測方法包括(但不僅限於)下列步驟：決定待預測資料與多個資料群組之間的距離。從多個機器學習模型中選擇與待預測資料之間具有最短距離的資料群組對應的機器學習模型。利用第一機器學習模型對待預測資料進行預測。那些機器學習模型分別使用不同資料群組所訓練。 The data prediction method of the embodiment of the present invention is applicable to machine learning, and the data prediction method includes (but is not limited to) the following steps: determining the distance between the data to be predicted and multiple data groups. Selecting a machine learning model corresponding to the data group with the shortest distance between the data to be predicted from multiple machine learning models. Predicting the data to be predicted using the first machine learning model. Those machine learning models are trained using different data groups respectively.

本發明實施例的資料預測裝置包括(但不僅限於)記憶體及處理器。記憶體用以儲存程式碼。處理器耦接記憶體。處理器經配置載入程式碼執行：決定待預測資料與多個資料群組之間的距離。從多個機器學習模型中選擇與待預測資料之間具有最短距離的資料群組對應的第一機器學習模型。利用第一機器學習模型對待預測資料進行預測。那些機器學習模型分別使用不同資料群組所訓練。 The data prediction device of the embodiment of the present invention includes (but is not limited to) a memory and a processor. The memory is used to store program code. The processor is coupled to the memory. The processor is configured to load the program code to execute: determine the distance between the data to be predicted and multiple data groups. Select the first machine learning model corresponding to the data group with the shortest distance between the data to be predicted from multiple machine learning models. Use the first machine learning model to predict the data to be predicted. Those machine learning models are trained using different data groups respectively.

基於上述，依據本發明實施例的資料預測方法及裝置，找尋最相似於待預測資料的資料群組對應的第一機器學習模型，並據以預測待預測資料。藉此，有助於改進機器學習的準確度(accuracy)、敏感度(sensitivity)及專一性(specificity)。 Based on the above, according to the data prediction method and device of the embodiment of the present invention, the first machine learning model corresponding to the data group most similar to the data to be predicted is found, and the data to be predicted is predicted accordingly. This helps to improve the accuracy, sensitivity and specificity of machine learning.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。 In order to make the above features and advantages of the present invention more clearly understood, the following is a detailed description of the embodiments with the accompanying drawings.

10:電子裝置 10: Electronic devices

11:記憶體 11: Memory

12:處理器 12: Processor

15:感測器 15: Sensor

S210~S250、S910~S930:步驟 S210~S250, S910~S930: Steps

圖1是依據本發明一實施例的資料預測裝置的元件方塊圖。 FIG1 is a block diagram of components of a data prediction device according to an embodiment of the present invention.

圖2是依據本發明一實施例的資料預測方法的流程圖。 Figure 2 is a flow chart of a data prediction method according to an embodiment of the present invention.

圖3是依據本發明一實施例的分析結果的示意圖。 Figure 3 is a schematic diagram of the analysis results according to an embodiment of the present invention.

圖4是依據本發明一實施例的主成分(Principal Component)分布圖。 Figure 4 is a principal component distribution diagram according to an embodiment of the present invention.

圖5是依據本發明一實施例的階層分群法(Hierarchical Clustering)的分群示意圖。 Figure 5 is a schematic diagram of hierarchical clustering according to an embodiment of the present invention.

圖6是依據本發明一實施例的第一群組分別訓練的驗證結果的示意圖。 Figure 6 is a schematic diagram of the verification results of the first group training according to an embodiment of the present invention.

圖7是依據本發明一實施例的第二群組分別訓練的驗證結果的示意圖。 Figure 7 is a schematic diagram of the verification results of the second group training according to an embodiment of the present invention.

圖8是依據本發明一實施例的多群組共同訓練的驗證結果的示意圖。 Figure 8 is a schematic diagram of the verification results of multi-group joint training according to an embodiment of the present invention.

圖9是依據本發明一實施例的資料預測的流程圖。 Figure 9 is a flow chart of data prediction according to an embodiment of the present invention.

圖1是依據本發明一實施例的資料預測裝置10的元件方塊圖。請參照圖1，資料預測裝置10包括(但不僅限於)記憶體11及處理器12。資料預測裝置10可以是手機、平板電腦、筆記型電腦、桌上型電腦、語音助理裝置、智能家電、穿戴式裝置、車載裝置或其他電子裝置。 FIG1 is a block diagram of a data prediction device 10 according to an embodiment of the present invention. Referring to FIG1 , the data prediction device 10 includes (but is not limited to) a memory 11 and a processor 12. The data prediction device 10 may be a mobile phone, a tablet computer, a laptop computer, a desktop computer, a voice assistant device, a smart home appliance, a wearable device, a car device, or other electronic devices.

記憶體11可以是任何型態的固定或可移動隨機存取記憶體(Radom Access Memory，RAM)、唯讀記憶體(Read Only Memory，ROM)、快閃記憶體(flash memory)、傳統硬碟(Hard Disk Drive，HDD)、固態硬碟(Solid-State Drive，SSD)或類似元件。在一實施例中，記憶體11用以儲存程式碼、軟體模組、組態配置、資料或檔案(例如，資料、模型、或特徵)，並待後續實施例詳述。 The memory 11 can be any type of fixed or removable random access memory (RAM), read only memory (ROM), flash memory, traditional hard disk drive (HDD), solid-state drive (SSD) or similar components. In one embodiment, the memory 11 is used to store program code, software modules, configurations, data or files (e.g., data, models, or features), and will be described in detail in subsequent embodiments.

處理器12耦接記憶體11。處理器12可以是中央處理單元(Central Processing Unit，CPU)、圖形處理單元(Graphic Processing unit，GPU)，或是其他可程式化之一般用途或特殊用途的微處理器(Microprocessor)、數位信號處理器(Digital Signal Processor，DSP)、可程式化控制器、現場可程式化邏輯閘陣列(Field Programmable Gate Array，FPGA)、特殊應用積體電路(Application-Specific Integrated Circuit，ASIC)、神經網路加速器或其他類似元件或上述元件的組合。在一實施例中，處理器12用以執行資料預測裝置10的所有或部份作業，且可載入並執行記憶體11所儲存的各程式碼、軟體模組、檔案及資料。在一些實施例中，本發明實施例的方法中的部分作業可能透過不同或相同處理器12實現。 The processor 12 is coupled to the memory 11. The processor 12 may be a central processing unit (CPU), a graphics processing unit (GPU), or other programmable general-purpose or special-purpose microprocessor, digital signal processor (DSP), programmable controller, field programmable gate array (FPGA), application-specific integrated circuit (ASIC), neural network accelerator or other similar components or a combination of the above components. In one embodiment, the processor 12 is used to execute all or part of the operations of the data prediction device 10, and can load and execute various program codes, software modules, files and data stored in the memory 11. In some embodiments, some operations in the method of the present invention may be implemented by different or the same processor 12.

在一實施例中，資料預測裝置10更包括感測器15。處理器12耦接感測器15。例如，感測器15透過USB、Thunderbolt、Wi-Fi、藍芽或其他有線或無線通訊技術連接處理器12。又例如，資料預測裝置10內建感測器15。感測器15可以是雷達、麥克風、溫度感測器、濕度感測器、影像感測器、運動感測器或其他類型感測器。在一實施例中，感測器15用以感測，以取得感測資料。在一實施例中，感測資料是時間相依的資料。也就是，與時序、連續時間或多時間點記錄的資料。例如，感測資料是雷達的感測結果(例如，同相(In-phase)正交(quadrature)訊號)、聲音訊號、或連續影像。 In one embodiment, the data prediction device 10 further includes a sensor 15. The processor 12 is coupled to the sensor 15. For example, the sensor 15 is connected to the processor 12 via USB, Thunderbolt, Wi-Fi, Bluetooth or other wired or wireless communication technologies. For another example, the data prediction device 10 has a built-in sensor 15. The sensor 15 can be a radar, a microphone, a temperature sensor, a humidity sensor, an image sensor, a motion sensor or other types of sensors. In one embodiment, the sensor 15 is used for sensing to obtain sensing data. In one embodiment, the sensing data is time-dependent data. That is, data recorded with a timing, continuous time or multiple time points. For example, the sensing data is a radar sensing result (e.g., in-phase or quadrature signal), a sound signal, or a continuous image.

下文中，將搭配資料預測裝置10中的各項裝置、元件及模組說明本發明實施例所述之方法。本方法的各個流程可依照實施情形而隨之調整，且不僅限於此。 In the following, the method described in the embodiment of the present invention will be described with the various devices, components and modules in the data prediction device 10. The various processes of the method can be adjusted according to the implementation situation, but are not limited to this.

圖2是依據本發明一實施例的資料預測方法的流程圖。請參照圖2，處理器12對多個特徵集合執行維度縮減(Dimensionality Reduction)分析，以取得分析結果(步驟S201)。具體而言，每一特徵集合包括一個或更多個特徵。特徵的類型可能依據感測器15的感測資料的類型而不同。以雷達的IQ訊號為例，特徵可以是不同通道之間的變異數或波形相關的特徵。又例如，聲學特徵中的過零率(Zero-Crossing Rate，ZCR)、音高、或梅爾倒頻譜系數(Mel Frequency Cepstral Coefficients，MFCC)。 FIG2 is a flow chart of a data prediction method according to an embodiment of the present invention. Referring to FIG2, the processor 12 performs dimensionality reduction analysis on multiple feature sets to obtain analysis results (step S201). Specifically, each feature set includes one or more features. The type of feature may vary depending on the type of sensing data of the sensor 15. Taking the IQ signal of the radar as an example, the feature may be a feature related to the variance or waveform between different channels. Another example is the zero-crossing rate (ZCR), pitch, or Mel Frequency Cepstral Coefficients (MFCC) in acoustic features.

在一實施例中，處理器12可轉換多個感測資料成為那些特徵集合。例如，將IQ訊號轉換成不同通道之間的變異數及波形相關的特徵。又例如，將聲音訊號轉換成ZCR、音高、或MFCC。 In one embodiment, the processor 12 can convert multiple sensing data into those feature sets. For example, the IQ signal is converted into the variance between different channels and waveform-related features. For another example, the sound signal is converted into ZCR, pitch, or MFCC.

例如，表(1)是雷達的IQ感測資料：

處理器12可將表(1)的感測資料透過重新塑形(re-shape)轉換成矩陣形式。例如，300×500的矩陣，且其元素為I或Q資料。 For example, Table (1) shows the IQ sensing data of the radar:

The processor 12 may convert the sensing data in Table (1) into a matrix form by reshaping, for example, a 300×500 matrix whose elements are I or Q data.

在另一實施例中，處理器12可透過通訊收發器(圖未示)下載或接收外部感測器的感測資料或外部運算裝置所產生的特徵集合。 In another embodiment, the processor 12 can download or receive sensing data from an external sensor or a feature set generated by an external computing device through a communication transceiver (not shown).

不同特徵集合可對應於不同受測者或不同目標對象的感測資料。例如，第一特徵集合是由第一受測者的感測資料轉換來的，且第二特徵集合是由第二受測者的感測資料轉換來的。或者，不同特徵集合可對應於相同受測者或相同目標對象但不同時間或不同環境的感測資料。例如，第三特徵集合對應於第三受測者在第一時段的感測資料，且第四特徵集合對應於第三受測者在第二時段的感測資料。 Different feature sets may correspond to sensing data of different subjects or different target objects. For example, the first feature set is converted from sensing data of the first subject, and the second feature set is converted from sensing data of the second subject. Alternatively, different feature sets may correspond to sensing data of the same subject or the same target object but at different times or in different environments. For example, the third feature set corresponds to sensing data of the third subject in the first time period, and the fourth feature set corresponds to sensing data of the third subject in the second time period.

在一實施例中，處理器12可對一個或更多個特徵集合進行標記。例如，標記諸如呼吸不足(hypopnea)、清醒或無呼吸(apnea)事件。然而，標記內容仍可依據特徵類型而不同，且本發明實施例不加以限制。 In one embodiment, the processor 12 may mark one or more feature sets. For example, marking events such as hypopnea, wakefulness, or apnea. However, the marking content may still vary depending on the feature type, and the embodiments of the present invention are not limited thereto.

維度縮減分析是用於減少特徵。也就是，每一特徵視為一個維度，且縮減維度也會縮減特徵。在一實施例中，維度縮減分析是主成分分析(Principal Components Analysis，PCA)或主坐標分析(Principal Co-ordinates Analysis，PCoA)。針對PCA，其是利用正交轉換來對一系列可能相關的變數的觀測值(於本實施例是特徵) 進行線性轉換，從而投影為一系列線性不相關變數的值。而這些不相關變數稱為主成分(Principal Components)。換言之，自多個特徵中找出最主要的元素及結構。與PCA不同的是，PCoA是對將觀測值透過不同距離演算法獲得觀測值的距離矩陣(記錄兩觀測值之間的差異/距離)的投影。此外，PCoA找出距離矩陣中的最主要的坐標。 Dimensionality reduction analysis is used to reduce features. That is, each feature is considered as a dimension, and reducing the dimension will also reduce the features. In one embodiment, the dimensionality reduction analysis is principal component analysis (PCA) or principal coordinate analysis (PCoA). For PCA, it uses orthogonal transformation to linearly transform the observed values of a series of possibly related variables (features in this embodiment) to project them into a series of linearly unrelated variable values. These unrelated variables are called principal components. In other words, the most important elements and structures are found from multiple features. Unlike PCA, PCoA is a projection of the distance matrix (recording the difference/distance between two observations) obtained by passing the observations through different distance algorithms. In addition, PCoA finds the most important coordinates in the distance matrix.

分析結果可以是主成分及其占比，或主要坐標及其占比。占比是指這主成分或主要坐標。舉例而言，圖3是依據本發明一實施例的分析結果的示意圖。請參照圖3，假設感測資料是透過連續波(Continuous Wave，CW)雷達感測睡眠的資料，且對應有受標記的驗證資料為整夜睡眠多項生理功能檢查(Polysomnography,PSG)產出的資料。比較的標的為睡眠事件例如呼吸不足、清醒或無呼吸事件。也就是，利用雷達去預測出睡眠事件。這實施例使用32位受試者資料進行分析。將32位受試者的雷達資料轉換成特徵後進行PCA/PCoA的處理，可得到如圖3所示之主成分架構。分析結果包括主成分PC1~PC11及占比。而主成分PC1的占比最高。 The analysis result can be the principal component and its proportion, or the principal coordinate and its proportion. The proportion refers to the principal component or the principal coordinate. For example, FIG3 is a schematic diagram of the analysis result according to an embodiment of the present invention. Please refer to FIG3, assuming that the sensing data is data of sleep sensing through a continuous wave (CW) radar, and the corresponding marked verification data is the data produced by a full-night sleep multiple physiological function test (Polysomnography, PSG). The comparison target is sleep events such as hypopnea, wakefulness or apnea events. That is, radar is used to predict sleep events. This embodiment uses data from 32 subjects for analysis. After converting the radar data of the 32 subjects into features and performing PCA/PCoA processing, the principal component structure shown in FIG3 can be obtained. The analysis results include principal components PC1~PC11 and proportions. The principal component PC1 accounts for the highest proportion.

在其他實施例中，維度縮減分析可以是線性判別分析(Linear Discriminant Analysis，LDA)、t-分布隨機鄰近嵌入(t-Distributed Stochastic Neighbor Embedding，t-SNE)或其他維度縮減。而分析結果包括縮減後的特徵或維度及其占比。 In other embodiments, the dimension reduction analysis may be Linear Discriminant Analysis (LDA), t-Distributed Stochastic Neighbor Embedding (t-SNE), or other dimension reduction. The analysis results include the reduced features or dimensions and their proportions.

請參照圖2，處理器12可依據分析結果正規化那些特徵集合，以產生多個正規化特徵集合(步驟S220)。具體而言，正規化 (normalization或稱歸一化)是將特徵的數值按比例縮放，並使縮放後的數值落入特定區間(例如，[0,1]或[0,10])。也就是，依據比例縮放特徵集合中的各特徵的數值至特定區間內。 Referring to FIG. 2 , the processor 12 may normalize the feature sets according to the analysis results to generate multiple normalized feature sets (step S220). Specifically, normalization is to scale the feature values and make the scaled values fall into a specific interval (e.g., [0,1] or [0,10]). That is, the values of each feature in the feature set are scaled to a specific interval.

在一實施例中，處理器12自多個主成分中挑選一個或更多個第一主成分，並依據這第一主成分正規化特徵集合。例如，處理器12設定區間的最大值及最小值，並對各個主成分進行歸一處理，以讓彼此的基準點一致。 In one embodiment, the processor 12 selects one or more first principal components from a plurality of principal components and normalizes the feature set according to the first principal component. For example, the processor 12 sets the maximum and minimum values of the interval and normalizes each principal component to make the benchmark points consistent with each other.

在一實施例中，第一主成分是那些主成分中占比最高的主成分。例如，圖3的主成分PC1的占比遠大於其他主成分PC2~PC11，故可選擇主成分PC1進行後續正規化處理。 In one embodiment, the first principal component is the principal component with the highest proportion among those principal components. For example, the proportion of principal component PC1 in Figure 3 is much larger than that of other principal components PC2~PC11, so principal component PC1 can be selected for subsequent normalization processing.

在另一實施例中，第一主成分為那些主成分中占比最高的主成分或占比第二高的主成分。占比最高的主成分與占比第二高的主成分之間的差距小於門檻值(例如，3、5或10%)。例如，占比最高的主成分與占比第二高的主成分之間的差距在5%以內，則一併考慮選擇占比第二高的主成分。而若尚有其他占比排序的主成分與占比最高的主成分之間的差異也小於門檻值，則也會一併納入後續正規化處理的考量。 In another embodiment, the first principal component is the principal component with the highest proportion or the second highest proportion among the principal components. The difference between the principal component with the highest proportion and the principal component with the second highest proportion is less than a threshold value (e.g., 3, 5, or 10%). For example, if the difference between the principal component with the highest proportion and the principal component with the second highest proportion is within 5%, the principal component with the second highest proportion is considered and selected. If there are other principal components ranked by proportion whose differences with the principal component with the highest proportion are also less than the threshold value, they will also be considered for subsequent regularization.

在一實施例中，處理器12可透過百分比轉換(Percentile transformation)排序那些特徵集合。也就是將特徵數值轉換成排名。例如，表(2)是一個特徵集合的特徵：表(2)

而表(2)經轉換後的排名為表(3)

In one embodiment, the processor 12 may sort the feature sets by percentile transformation, that is, converting the feature values into rankings. For example, Table (2) is a feature set of features: Table (2)

The ranking of Table (2) after transformation is Table (3)

圖4是依據本發明一實施例的主成分(Principal Component)分布圖。請參照圖4，水平軸為特徵並依據標號排列，且垂直軸為不同受測者的編號。不同受測者的特徵不盡相同。例如，編號5和10的受測者的第17個及第18個特徵的表現(即，重要程度，並在圖中以不同灰階表示)不同於其他受測者。 FIG4 is a principal component distribution diagram according to an embodiment of the present invention. Referring to FIG4, the horizontal axis is the features and is arranged according to the labels, and the vertical axis is the numbers of different subjects. The features of different subjects are not the same. For example, the performance (i.e., the importance, and represented by different gray levels in the figure) of the 17th and 18th features of subjects numbered 5 and 10 are different from those of other subjects.

請參照圖2，處理器12產生那些正規化特徵集合的距離關係(步驟S230)。具體而言，距離關係包括那些正規化特徵集合中的兩者之間的距離。處理器12可將正規化特徵集合中的特徵投影到相同空間並形成坐標，並計算不同正規化特徵集合的特徵之間在空間中的距離(即，兩個坐標之間的距離)。 Referring to FIG. 2 , the processor 12 generates the distance relationship of those normalized feature sets (step S230). Specifically, the distance relationship includes the distance between two of those normalized feature sets. The processor 12 may project the features in the normalized feature sets into the same space and form coordinates, and calculate the distance between the features of different normalized feature sets in the space (i.e., the distance between two coordinates).

在一實施例中，距離關係是距離矩陣(distance matrix)，且距離矩陣中的每一元素是兩個正規化特徵集合中的特徵之間的距離。距離演算法可以是歐幾里得距離、餘弦相似性或KL偏差(Kullback-Leibler Divergence)。例如，第一正規化特徵集合為[1.5,2.2]，第二正規化特徵集合為[0.1,1.6]，且第三正規化特徵集合為 [5.7,4.3]。距離矩陣為[1.52,4.7,6.22]，其中以歐幾里得距離為例，(1.5-0.1)^2+(2.2-1.6)^2取開根號得出1.52，其餘依此類推。 In one embodiment, the distance relationship is a distance matrix, and each element in the distance matrix is the distance between features in two normalized feature sets. The distance algorithm can be Euclidean distance, cosine similarity, or KL divergence. For example, the first normalized feature set is [1.5, 2.2], the second normalized feature set is [0.1, 1.6], and the third normalized feature set is [5.7, 4.3]. The distance matrix is [1.52, 4.7, 6.22]. Taking the Euclidean distance as an example, taking the square root of (1.5-0.1)^2+(2.2-1.6)^2 gives 1.52, and the rest are similar.

距離關係不限於矩陣形式，在其他實施例中，距離關係也可能是對照表、數學轉換式或其他記錄不同特徵集合之間的距離的關係。 The distance relationship is not limited to the matrix form. In other embodiments, the distance relationship may also be a comparison table, a mathematical conversion formula, or other relationship that records the distances between different feature sets.

請參照圖2，處理器12依據距離關係分群那些特徵集合，以產生多個資料群組(步驟S240)。具體而言，每一資料群組包括一個或更多個特徵集合。由距離關係可以得知不同特徵集合之間的相似性。而分群是將相似性越高的多個特徵集合分配到相同資料群組。分群法可以是K平均分群(K-Means Clustering)、階層分群法(Hierarchical Clustering)或模糊分群法(Fuzzy clustering)。 Please refer to FIG. 2 , the processor 12 clusters those feature sets according to the distance relationship to generate multiple data groups (step S240). Specifically, each data group includes one or more feature sets. The similarity between different feature sets can be known from the distance relationship. Clustering is to assign multiple feature sets with higher similarity to the same data group. The clustering method can be K-Means Clustering, Hierarchical Clustering or Fuzzy Clustering.

舉例而言，圖5是依據本發明一實施例的階層分群法(Hierarchical Clustering)的分群示意圖。請參照圖5，各特徵集合編號對應於一個特徵集合(例如，一個受測者)。處理器12利用階層分群法(Hierarchical Clustering)將距離最接近的特徵集合分群為那些資料群組其中之一。特徵集合在編號28及16(例如，編號16及28的受測者對應的特徵集合)之間可拆分兩個群組。須說明的是，最接近是指與距離門檻值比較的結果。若兩特徵集合之間的距離小於距離門檻值，則視為最接近的兩特徵集合；反之，則視為不接近的兩特徵集合。 For example, FIG5 is a schematic diagram of clustering according to a hierarchical clustering method according to an embodiment of the present invention. Referring to FIG5, each feature set number corresponds to a feature set (e.g., a subject). The processor 12 uses the hierarchical clustering method to cluster the feature sets with the closest distance into one of those data groups. The feature sets can be split into two groups between numbers 28 and 16 (e.g., feature sets corresponding to subjects with numbers 16 and 28). It should be noted that the closest refers to the result of comparison with the distance threshold. If the distance between the two feature sets is less than the distance threshold, they are regarded as the closest two feature sets; otherwise, they are regarded as two non-close feature sets.

在一實施例中，處理器12可決定該些資料群組的群組數量，依據群組數量決定分群距離，並依據這分群距離分群那些特徵集合。以圖5為例，群組數量為2，則分群距離為60。而編號5、12、11、27、19、23、30、3、28的特徵集合(例如，受試者編號5、12、11、27、19、23、30、3、28對應的特徵集合)之間的距離在60內，故皆被分配到相同資料群組。群組數量為3，則分群距離為50。而編號16、24、10、15、29的特徵集合之間的距離在50內，故皆被分配到相同資料群組。 In one embodiment, the processor 12 may determine the number of groups of the data groups, determine the clustering distance according to the number of groups, and cluster the feature sets according to the clustering distance. Taking FIG. 5 as an example, if the number of groups is 2, the clustering distance is 60. The distances between the feature sets numbered 5, 12, 11, 27, 19, 23, 30, 3, 28 (for example, the feature sets corresponding to the subject numbers 5, 12, 11, 27, 19, 23, 30, 3, 28) are within 60, so they are all assigned to the same data group. If the number of groups is 3, the clustering distance is 50. The distances between the feature sets numbered 16, 24, 10, 15, 29 are within 50, so they are all assigned to the same data group.

請參照圖2，處理器12分別使用那些資料群組訓練多個機器學習模型(步驟S250)。具體而言，得出分群結果之後，每一資料群組可與其他資料群組分開訓練。那些機器學習模型分別使用不同資料群組所訓練。處理器12可使用每一資料群組對應的那些特徵集合(也就是，從感測資料轉換的特徵集合)或未經轉換的那些感測資料訓練對應的機器學習模型。例如，使用第一資料群組的特徵集合訓練第一機器學習模型，且使用第二資料群組的特徵集合訓練第二機器學習模型。而第一資料群組不會用於訓練第二機器學習模型。此外，機器學習演算法可以是深度學習(Deep Learning)、決策樹(Decision Tree)、循環神經網路(Recurrent Neural Network，RNN)或其他演算法。 Please refer to FIG. 2 , the processor 12 uses those data groups to train multiple machine learning models (step S250). Specifically, after the clustering results are obtained, each data group can be trained separately from other data groups. Those machine learning models are trained using different data groups. The processor 12 can use the feature sets corresponding to each data group (that is, the feature sets converted from the sensor data) or the unconverted sensor data to train the corresponding machine learning models. For example, the feature set of the first data group is used to train the first machine learning model, and the feature set of the second data group is used to train the second machine learning model. The first data group will not be used to train the second machine learning model. In addition, the machine learning algorithm can be deep learning, decision tree, recurrent neural network (RNN) or other algorithms.

以下驗證結果可證明本發明實施例的分群訓練有助於機器學習的訓練。 The following verification results can prove that the clustering training of the embodiment of the present invention is helpful for the training of machine learning.

圖6是依據本發明一實施例的第一群組分別訓練的驗證結果的示意圖，圖7是依據本發明一實施例的第二群組分別訓練的驗證結果的示意圖，且圖8是依據本發明一實施例的多群組共同訓練的驗證結果的示意圖。請參照圖6、圖7及圖8，準確度是透過各機器學習模型所得出的多筆預測結果與實際結果的正確率。敏感度是實際為陽性的樣本中判斷為陽性的比例。專一性是實際為陰性的樣本中判斷為陰性的比例。 FIG6 is a schematic diagram of the verification results of the first group training separately according to an embodiment of the present invention, FIG7 is a schematic diagram of the verification results of the second group training separately according to an embodiment of the present invention, and FIG8 is a schematic diagram of the verification results of the multi-group joint training according to an embodiment of the present invention. Please refer to FIG6, FIG7 and FIG8, accuracy is the correctness of multiple prediction results obtained by each machine learning model and the actual results. Sensitivity is the proportion of samples judged as positive in actual positive samples. Specificity is the proportion of samples judged as negative in actual negative samples.

圖6是使用圖5中的編號5、12、11、27、19、23、30、3、28的特徵集合(例如，受試者編號5、12、11、27、19、23、30、3、28對應的特徵集合或原始的感測資料)訓練的驗證結果。圖7是使用圖5中的其他特徵集合(例如，受試者編號5、12、11、27、19、23、30、3、28以外的其他受試者對應的特徵集合或原始的感測資料)訓練的驗證結果。圖8是使用圖5中所有特徵集合(例如，所有受試者對應的特徵集合或原始的感測資料)共同訓練的驗證結果。分群訓練(對應於圖6、圖7)在準確度、敏感度及專一性都優於共同訓練(對應於圖8)。以準確度為例，在共同訓練下，圖8所示的準確度大約收斂在0.7。而在分群訓練下，圖6、圖7所示的準確度可收斂在0.7以上。甚至，僅針對相同資料群組的圖6的準確度可大約收斂在0.9。 FIG6 is a verification result of training using the feature sets numbered 5, 12, 11, 27, 19, 23, 30, 3, 28 in FIG5 (e.g., feature sets corresponding to subject numbers 5, 12, 11, 27, 19, 23, 30, 3, 28 or original sensing data). FIG7 is a verification result of training using other feature sets in FIG5 (e.g., feature sets corresponding to other subjects other than subject numbers 5, 12, 11, 27, 19, 23, 30, 3, 28 or original sensing data). FIG8 is a verification result of training using all feature sets in FIG5 (e.g., feature sets corresponding to all subjects or original sensing data). Cluster training (corresponding to Figure 6 and Figure 7) is superior to joint training (corresponding to Figure 8) in accuracy, sensitivity and specificity. Taking accuracy as an example, under joint training, the accuracy shown in Figure 8 converges to about 0.7. Under cluster training, the accuracy shown in Figure 6 and Figure 7 can converge to above 0.7. Even the accuracy of Figure 6 for the same data group can converge to about 0.9.

除了訓練優化，本發明實施例可優化模型預測。圖9是依據本發明一實施例的資料預測的流程圖。請參照圖9，處理器12可決定待預測資料與那些資料群組之間的距離(步驟S910)。具體而言，處理器12可先取得待預測資料。待預測資料可參酌前述感測資料的說明，於此不再贅述。視需求，處理器12將待預測資料轉換成待預測特徵集合。特徵轉換的說明可參酌前述感測資料至特徵集合的轉換，於此不再贅述。接著，處理器12決定待預測特徵集合與多個資料群組之間的距離。 In addition to training optimization, the embodiment of the present invention can optimize model prediction. FIG. 9 is a flow chart of data prediction according to an embodiment of the present invention. Referring to FIG. 9, the processor 12 can determine the distance between the data to be predicted and those data groups (step S910). Specifically, the processor 12 can first obtain the data to be predicted. The data to be predicted can refer to the description of the aforementioned sensing data, which will not be repeated here. As required, the processor 12 converts the data to be predicted into a feature set to be predicted. The description of feature conversion can refer to the conversion of the aforementioned sensing data to the feature set, which will not be repeated here. Then, the processor 12 determines the distance between the feature set to be predicted and multiple data groups.

舉例而言，第一資料群組的代表數值(例如，平均值、中位數或其他統計值)為[8.16,9.8,3.7,15.54,2.74,4.04,16.82,4.56,21.,11.88,12.78,11.1,9.54,7.22,7.24,18.34,17.04,4.24,20.,12.1,13.16]，第二資料群組的代表數值為[4.61,6.42,9.95,5.7,4.,6.61,2.85,10.28,21,15.85,14.66,12.047,8.28,10.38,9.95,18.85,16.42,3.57,20,13.33,16.09]，且待預測特徵集合為[10,13,6,16,2,3,17,5,21,9,15,12,8,7,4,19,18,1,20,11,14]。以歐幾里得距離為例，待預測特徵集合與第一資料群組之間的距離為7.855，且待預測特徵集合與第二資料群組之間的距離為23.495。 For example, the representative values (e.g., mean, median, or other statistical values) of the first data group are [8.16, 9.8, 3.7, 15.54, 2.74, 4.04, 16.82, 4.56, 21., 11.88, 12.78, 11.1, 9.54, 7.22, 7.24, 18.34, 17.04, 4.24, 20., 12.1, 13.16], and the representative values of the second data group are [4.61, 6.4 2,9.95,5.7,4.,6.61,2.85,10.28,21,15.85,14.66,12.047,8.28,10.38,9.95,18.85,16.42,3.57,20,13.33,16.09], and the feature set to be predicted is [10,13,6,16,2,3,17,5,21,9,15,12,8,7,4,19,18,1,20,11,14]. Taking the Euclidean distance as an example, the distance between the feature set to be predicted and the first data group is 7.855, and the distance between the feature set to be predicted and the second data group is 23.495.

處理器12可從多個機器學習模型中選擇與待預測資料之間具有最短距離的資料群組對應的第一機器學習模型(步驟S920)，並利用這第一機器學習模型對待預測資料進行預測(步驟S930)。例如，7.855小於23.495，因此與待預測特徵集合之間具有最短距離的資料群組為第一資料群組。處理器12可載入第一資料群組的第一機器學習模型，並輸入待預測資料至所載入的第一機器學習模型，以預測結果。預測資料以雷達的感測結果為例，則預測結果可以是睡眠事件。然而，預測結果仍可依據實際需求而改變。 The processor 12 can select a first machine learning model corresponding to the data group with the shortest distance between the data to be predicted from multiple machine learning models (step S920), and use the first machine learning model to predict the data to be predicted (step S930). For example, 7.855 is less than 23.495, so the data group with the shortest distance between the feature set to be predicted is the first data group. The processor 12 can load the first machine learning model of the first data group, and input the data to be predicted into the loaded first machine learning model to predict the result. The predicted data takes the sensing result of the radar as an example, and the predicted result can be a sleep event. However, the predicted result can still be changed according to actual needs.

須說明的是，在一實施例中，反應於多個資料群組與待預測資料之間的距離小於距離下限或大於距離上限，則這些資料群組的機器學習模型皆可能受選用於預測待預測資料的結果。在另一實施例中，反應於多個資料群組與待預測資料之間的距離皆相同、或距離小於預設值，則處理器12可載入多資料群組共同訓練的機器學習模型進行預測。 It should be noted that in one embodiment, in response to the distance between multiple data groups and the data to be predicted being less than the lower limit of the distance or greater than the upper limit of the distance, the machine learning models of these data groups may all be selected to predict the results of the data to be predicted. In another embodiment, in response to the distance between multiple data groups and the data to be predicted being the same or less than a preset value, the processor 12 may load the machine learning model trained by multiple data groups for prediction.

綜上所述，在本發明實施例的資料預測方法及裝置中，依據維度縮減的結果正規化特徵集合，並進一步分群。接著，使用不同資料群組訓練不同機器學習模型。此外，選擇距離相近的資料群組對應的機器學習模型進行預測。藉此，可提升訓練及預測的效果。 In summary, in the data prediction method and device of the embodiment of the present invention, the feature set is normalized according to the result of dimension reduction and further grouped. Then, different machine learning models are trained using different data groups. In addition, machine learning models corresponding to data groups with similar distances are selected for prediction. In this way, the effect of training and prediction can be improved.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。 Although the present invention has been disclosed as above by the embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the relevant technical field can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention shall be subject to the scope of the attached patent application.

S910~S920:步驟 S910~S920: Steps

Claims

A data prediction method, applicable to machine learning, comprises: grouping a plurality of feature sets; performing a dimensionality reduction analysis on the feature sets to obtain an analysis result, wherein the dimensionality reduction analysis is principal component analysis (PCA) or principal coordinates analysis (PCoA), and the analysis result comprises the proportion of a plurality of principal components; normalizing the feature sets according to the analysis result to generate a plurality of normalized feature sets, and generating a plurality of data groups, wherein each of the normalized feature sets comprises at least one feature, and each of the data groups comprises at least one of the normalized feature sets; training a plurality of machine learning models respectively using the data groups; determining a data to be predicted The distance between the data groups; determining the data group with the shortest distance to the data to be predicted as a first data group, wherein a first machine learning model has been trained using the first data group; selecting the first machine learning model from the machine learning models; and predicting the data to be predicted using the first machine learning model, wherein the machine learning models are trained using different data groups respectively.

The data prediction method as described in claim 1 further includes: generating a distance relationship among the normalized feature sets, wherein the distance relationship includes the distance between two of the normalized feature sets; and grouping the feature sets according to the distance relationship to generate the data groups.

The data prediction method as described in claim 2, wherein the step of normalizing the feature sets based on the analysis results includes: selecting a first principal component from the principal components; and normalizing the feature sets based on the first principal component.

The data prediction method as described in claim 3, wherein the first principal component is the principal component with the highest proportion among the principal components.

The data prediction method as described in claim 3, wherein the first principal component is the principal component with the highest proportion or the second highest proportion among the principal components, and the difference between the principal component with the highest proportion and the principal component with the second highest proportion is less than a threshold value.

A data prediction method as described in claim 2, wherein the distance relationship is a distance matrix, and each element in the distance matrix is the distance between two features in the normalized feature set.

The data prediction method as described in claim 2, wherein the step of clustering the feature sets according to the distance relationship includes: clustering the feature sets with the closest distance into one of the data groups using a hierarchical clustering method.

The data prediction method as described in claim 7 further includes: determining the number of groups of the data groups; determining a clustering distance based on the number of groups; and clustering the feature sets based on the clustering distance.

The data prediction method as described in claim 2 further includes: converting a plurality of sensing data into the feature sets, wherein the sensing data are time-dependent data; and using the feature sets or the sensing data corresponding to each data group to train the corresponding machine learning model.

The data prediction method as described in claim 9, wherein each of the sensing data is a sensing result of a radar.

A data prediction device includes: a memory storing a program code; and a processor coupled to the memory and configured to load the program code to execute: grouping a plurality of feature sets; performing a dimensionality reduction analysis on the feature sets to obtain an analysis result, wherein the dimensionality reduction analysis is principal component analysis (PCA) or principal coordinate analysis (PCA). The method comprises the following steps: performing a PCoA analysis, wherein the analysis result comprises the proportion of a plurality of principal components; normalizing the feature sets according to the analysis result to generate a plurality of normalized feature sets, and generating a plurality of data groups, wherein each of the normalized feature sets comprises at least one feature, and each of the data groups comprises at least one of the normalized feature sets; training a plurality of machine learning models using the data groups respectively; determining a pair of data to be predicted and a pair of data to be predicted; The distance between the data groups; determining that the data group with the shortest distance to the data to be predicted is a first data group, wherein a first machine learning model has been trained using the first data group; selecting the first machine learning model from the machine learning models; and predicting the data to be predicted using the first machine learning model, wherein the machine learning models are trained using different data groups respectively.

The data prediction device as described in claim 11, wherein the processor further executes: generating a distance relationship of the normalized feature sets, wherein the distance relationship includes the distance between two of the normalized feature sets; and clustering the feature sets according to the distance relationship to generate the data groups.

The data prediction device as described in claim 12, wherein the processor further performs: selecting a first principal component from the principal components; and normalizing the feature sets according to the first principal component.

A data prediction device as described in claim 13, wherein the first principal component is the principal component with the highest proportion among the principal components.

The data prediction device as described in claim 13, wherein the first principal component is the principal component with the highest proportion or the second highest proportion among the principal components, and the difference between the principal component with the highest proportion and the principal component with the second highest proportion is less than a threshold value.

A data prediction device as described in claim 12, wherein the distance relationship is a distance matrix, and each element in the distance matrix is the distance between two features in the normalized feature set.

The data prediction device as described in claim 12, wherein the processor further performs: using a hierarchical clustering method to cluster the feature sets with the closest distance into one of the data groups.

The data prediction device as described in claim 17, wherein the processor further performs: determining the number of groups of the data groups; determining a clustering distance based on the number of groups; and clustering the feature sets based on the clustering distance.

The data prediction device as described in claim 18, wherein the processor further performs: converting a plurality of sensing data into the feature sets, wherein the sensing data are time-dependent data; and using the feature sets or the sensing data corresponding to each data group to train the corresponding machine learning model.

A data prediction device as described in claim 19, wherein each of the sensing data is a sensing result of a radar.