TWI865935B - Data processing method and apparatus - Google Patents
Data processing method and apparatus Download PDFInfo
- Publication number
- TWI865935B TWI865935B TW111137595A TW111137595A TWI865935B TW I865935 B TWI865935 B TW I865935B TW 111137595 A TW111137595 A TW 111137595A TW 111137595 A TW111137595 A TW 111137595A TW I865935 B TWI865935 B TW I865935B
- Authority
- TW
- Taiwan
- Prior art keywords
- data
- feature sets
- distance
- machine learning
- principal component
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
Description
本發明是有關於一種資料預測技術,且特別是有關於一種用於機器學習(machine learning)的資料預測方法及裝置。 The present invention relates to a data prediction technology, and in particular to a data prediction method and device for machine learning.
機器學習演算法可透過分析大量資料以推論這些資料的規律,從而對未知資料進行預測。近年來,機器學習已廣泛應用在影像辨識、自然語言處理、結果預測、醫療診斷、錯誤偵測或語音辨識等領域。 Machine learning algorithms can analyze large amounts of data to infer the patterns of these data and make predictions about unknown data. In recent years, machine learning has been widely used in fields such as image recognition, natural language processing, outcome prediction, medical diagnosis, error detection, or speech recognition.
有鑑於此,本發明實施例提供一種資料預測方法及裝置,可分群預測資料,以提升預測準確度。 In view of this, the present invention provides a data prediction method and device that can group the predicted data to improve the prediction accuracy.
本發明實施例的資料預測方法適用於機器學習,且資料預測方法包括(但不僅限於)下列步驟:決定待預測資料與多個資料群組之間的距離。從多個機器學習模型中選擇與待預測資料之間具有最短距離的資料群組對應的機器學習模型。利用第一機器學 習模型對待預測資料進行預測。那些機器學習模型分別使用不同資料群組所訓練。 The data prediction method of the embodiment of the present invention is applicable to machine learning, and the data prediction method includes (but is not limited to) the following steps: determining the distance between the data to be predicted and multiple data groups. Selecting a machine learning model corresponding to the data group with the shortest distance between the data to be predicted from multiple machine learning models. Predicting the data to be predicted using the first machine learning model. Those machine learning models are trained using different data groups respectively.
本發明實施例的資料預測裝置包括(但不僅限於)記憶體及處理器。記憶體用以儲存程式碼。處理器耦接記憶體。處理器經配置載入程式碼執行:決定待預測資料與多個資料群組之間的距離。從多個機器學習模型中選擇與待預測資料之間具有最短距離的資料群組對應的第一機器學習模型。利用第一機器學習模型對待預測資料進行預測。那些機器學習模型分別使用不同資料群組所訓練。 The data prediction device of the embodiment of the present invention includes (but is not limited to) a memory and a processor. The memory is used to store program code. The processor is coupled to the memory. The processor is configured to load the program code to execute: determine the distance between the data to be predicted and multiple data groups. Select the first machine learning model corresponding to the data group with the shortest distance between the data to be predicted from multiple machine learning models. Use the first machine learning model to predict the data to be predicted. Those machine learning models are trained using different data groups respectively.
基於上述,依據本發明實施例的資料預測方法及裝置,找尋最相似於待預測資料的資料群組對應的第一機器學習模型,並據以預測待預測資料。藉此,有助於改進機器學習的準確度(accuracy)、敏感度(sensitivity)及專一性(specificity)。 Based on the above, according to the data prediction method and device of the embodiment of the present invention, the first machine learning model corresponding to the data group most similar to the data to be predicted is found, and the data to be predicted is predicted accordingly. This helps to improve the accuracy, sensitivity and specificity of machine learning.
為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。 In order to make the above features and advantages of the present invention more clearly understood, the following is a detailed description of the embodiments with the accompanying drawings.
10:電子裝置 10: Electronic devices
11:記憶體 11: Memory
12:處理器 12: Processor
15:感測器 15: Sensor
S210~S250、S910~S930:步驟 S210~S250, S910~S930: Steps
圖1是依據本發明一實施例的資料預測裝置的元件方塊圖。 FIG1 is a block diagram of components of a data prediction device according to an embodiment of the present invention.
圖2是依據本發明一實施例的資料預測方法的流程圖。 Figure 2 is a flow chart of a data prediction method according to an embodiment of the present invention.
圖3是依據本發明一實施例的分析結果的示意圖。 Figure 3 is a schematic diagram of the analysis results according to an embodiment of the present invention.
圖4是依據本發明一實施例的主成分(Principal Component)分 布圖。 Figure 4 is a principal component distribution diagram according to an embodiment of the present invention.
圖5是依據本發明一實施例的階層分群法(Hierarchical Clustering)的分群示意圖。 Figure 5 is a schematic diagram of hierarchical clustering according to an embodiment of the present invention.
圖6是依據本發明一實施例的第一群組分別訓練的驗證結果的示意圖。 Figure 6 is a schematic diagram of the verification results of the first group training according to an embodiment of the present invention.
圖7是依據本發明一實施例的第二群組分別訓練的驗證結果的示意圖。 Figure 7 is a schematic diagram of the verification results of the second group training according to an embodiment of the present invention.
圖8是依據本發明一實施例的多群組共同訓練的驗證結果的示意圖。 Figure 8 is a schematic diagram of the verification results of multi-group joint training according to an embodiment of the present invention.
圖9是依據本發明一實施例的資料預測的流程圖。 Figure 9 is a flow chart of data prediction according to an embodiment of the present invention.
圖1是依據本發明一實施例的資料預測裝置10的元件方塊圖。請參照圖1,資料預測裝置10包括(但不僅限於)記憶體11及處理器12。資料預測裝置10可以是手機、平板電腦、筆記型電腦、桌上型電腦、語音助理裝置、智能家電、穿戴式裝置、車載裝置或其他電子裝置。
FIG1 is a block diagram of a
記憶體11可以是任何型態的固定或可移動隨機存取記憶體(Radom Access Memory,RAM)、唯讀記憶體(Read Only Memory,ROM)、快閃記憶體(flash memory)、傳統硬碟(Hard Disk Drive,HDD)、固態硬碟(Solid-State Drive,SSD)或類似元件。在一實施例中,記憶體11用以儲存程式碼、軟體模組、組態配置、資料或檔
案(例如,資料、模型、或特徵),並待後續實施例詳述。
The
處理器12耦接記憶體11。處理器12可以是中央處理單元(Central Processing Unit,CPU)、圖形處理單元(Graphic Processing unit,GPU),或是其他可程式化之一般用途或特殊用途的微處理器(Microprocessor)、數位信號處理器(Digital Signal Processor,DSP)、可程式化控制器、現場可程式化邏輯閘陣列(Field Programmable Gate Array,FPGA)、特殊應用積體電路(Application-Specific Integrated Circuit,ASIC)、神經網路加速器或其他類似元件或上述元件的組合。在一實施例中,處理器12用以執行資料預測裝置10的所有或部份作業,且可載入並執行記憶體11所儲存的各程式碼、軟體模組、檔案及資料。在一些實施例中,本發明實施例的方法中的部分作業可能透過不同或相同處理器12實現。
The
在一實施例中,資料預測裝置10更包括感測器15。處理器12耦接感測器15。例如,感測器15透過USB、Thunderbolt、Wi-Fi、藍芽或其他有線或無線通訊技術連接處理器12。又例如,資料預測裝置10內建感測器15。感測器15可以是雷達、麥克風、溫度感測器、濕度感測器、影像感測器、運動感測器或其他類型感測器。在一實施例中,感測器15用以感測,以取得感測資料。在一實施例中,感測資料是時間相依的資料。也就是,與時序、連續時間或多時間點記錄的資料。例如,感測資料是雷達的感測結果(例如,同相(In-phase)正交(quadrature)訊號)、聲音訊號、或連續影像。
In one embodiment, the
下文中,將搭配資料預測裝置10中的各項裝置、元件及模組說明本發明實施例所述之方法。本方法的各個流程可依照實施情形而隨之調整,且不僅限於此。
In the following, the method described in the embodiment of the present invention will be described with the various devices, components and modules in the
圖2是依據本發明一實施例的資料預測方法的流程圖。請參照圖2,處理器12對多個特徵集合執行維度縮減(Dimensionality Reduction)分析,以取得分析結果(步驟S201)。具體而言,每一特徵集合包括一個或更多個特徵。特徵的類型可能依據感測器15的感測資料的類型而不同。以雷達的IQ訊號為例,特徵可以是不同通道之間的變異數或波形相關的特徵。又例如,聲學特徵中的過零率(Zero-Crossing Rate,ZCR)、音高、或梅爾倒頻譜系數(Mel Frequency Cepstral Coefficients,MFCC)。
FIG2 is a flow chart of a data prediction method according to an embodiment of the present invention. Referring to FIG2, the
在一實施例中,處理器12可轉換多個感測資料成為那些特徵集合。例如,將IQ訊號轉換成不同通道之間的變異數及波形相關的特徵。又例如,將聲音訊號轉換成ZCR、音高、或MFCC。
In one embodiment, the
例如,表(1)是雷達的IQ感測資料:
在另一實施例中,處理器12可透過通訊收發器(圖未示)下載或接收外部感測器的感測資料或外部運算裝置所產生的特徵集合。
In another embodiment, the
不同特徵集合可對應於不同受測者或不同目標對象的感測資料。例如,第一特徵集合是由第一受測者的感測資料轉換來的,且第二特徵集合是由第二受測者的感測資料轉換來的。或者,不同特徵集合可對應於相同受測者或相同目標對象但不同時間或不同環境的感測資料。例如,第三特徵集合對應於第三受測者在第一時段的感測資料,且第四特徵集合對應於第三受測者在第二時段的感測資料。 Different feature sets may correspond to sensing data of different subjects or different target objects. For example, the first feature set is converted from sensing data of the first subject, and the second feature set is converted from sensing data of the second subject. Alternatively, different feature sets may correspond to sensing data of the same subject or the same target object but at different times or in different environments. For example, the third feature set corresponds to sensing data of the third subject in the first time period, and the fourth feature set corresponds to sensing data of the third subject in the second time period.
在一實施例中,處理器12可對一個或更多個特徵集合進行標記。例如,標記諸如呼吸不足(hypopnea)、清醒或無呼吸(apnea)事件。然而,標記內容仍可依據特徵類型而不同,且本發明實施例不加以限制。
In one embodiment, the
維度縮減分析是用於減少特徵。也就是,每一特徵視為一個維度,且縮減維度也會縮減特徵。在一實施例中,維度縮減分析是主成分分析(Principal Components Analysis,PCA)或主坐標分析(Principal Co-ordinates Analysis,PCoA)。針對PCA,其是利用正交轉換來對一系列可能相關的變數的觀測值(於本實施例是特徵) 進行線性轉換,從而投影為一系列線性不相關變數的值。而這些不相關變數稱為主成分(Principal Components)。換言之,自多個特徵中找出最主要的元素及結構。與PCA不同的是,PCoA是對將觀測值透過不同距離演算法獲得觀測值的距離矩陣(記錄兩觀測值之間的差異/距離)的投影。此外,PCoA找出距離矩陣中的最主要的坐標。 Dimensionality reduction analysis is used to reduce features. That is, each feature is considered as a dimension, and reducing the dimension will also reduce the features. In one embodiment, the dimensionality reduction analysis is principal component analysis (PCA) or principal coordinate analysis (PCoA). For PCA, it uses orthogonal transformation to linearly transform the observed values of a series of possibly related variables (features in this embodiment) to project them into a series of linearly unrelated variable values. These unrelated variables are called principal components. In other words, the most important elements and structures are found from multiple features. Unlike PCA, PCoA is a projection of the distance matrix (recording the difference/distance between two observations) obtained by passing the observations through different distance algorithms. In addition, PCoA finds the most important coordinates in the distance matrix.
分析結果可以是主成分及其占比,或主要坐標及其占比。占比是指這主成分或主要坐標。舉例而言,圖3是依據本發明一實施例的分析結果的示意圖。請參照圖3,假設感測資料是透過連續波(Continuous Wave,CW)雷達感測睡眠的資料,且對應有受標記的驗證資料為整夜睡眠多項生理功能檢查(Polysomnography,PSG)產出的資料。比較的標的為睡眠事件例如呼吸不足、清醒或無呼吸事件。也就是,利用雷達去預測出睡眠事件。這實施例使用32位受試者資料進行分析。將32位受試者的雷達資料轉換成特徵後進行PCA/PCoA的處理,可得到如圖3所示之主成分架構。分析結果包括主成分PC1~PC11及占比。而主成分PC1的占比最高。 The analysis result can be the principal component and its proportion, or the principal coordinate and its proportion. The proportion refers to the principal component or the principal coordinate. For example, FIG3 is a schematic diagram of the analysis result according to an embodiment of the present invention. Please refer to FIG3, assuming that the sensing data is data of sleep sensing through a continuous wave (CW) radar, and the corresponding marked verification data is the data produced by a full-night sleep multiple physiological function test (Polysomnography, PSG). The comparison target is sleep events such as hypopnea, wakefulness or apnea events. That is, radar is used to predict sleep events. This embodiment uses data from 32 subjects for analysis. After converting the radar data of the 32 subjects into features and performing PCA/PCoA processing, the principal component structure shown in FIG3 can be obtained. The analysis results include principal components PC1~PC11 and proportions. The principal component PC1 accounts for the highest proportion.
在其他實施例中,維度縮減分析可以是線性判別分析(Linear Discriminant Analysis,LDA)、t-分布隨機鄰近嵌入(t-Distributed Stochastic Neighbor Embedding,t-SNE)或其他維度縮減。而分析結果包括縮減後的特徵或維度及其占比。 In other embodiments, the dimension reduction analysis may be Linear Discriminant Analysis (LDA), t-Distributed Stochastic Neighbor Embedding (t-SNE), or other dimension reduction. The analysis results include the reduced features or dimensions and their proportions.
請參照圖2,處理器12可依據分析結果正規化那些特徵集合,以產生多個正規化特徵集合(步驟S220)。具體而言,正規化
(normalization或稱歸一化)是將特徵的數值按比例縮放,並使縮放後的數值落入特定區間(例如,[0,1]或[0,10])。也就是,依據比例縮放特徵集合中的各特徵的數值至特定區間內。
Referring to FIG. 2 , the
在一實施例中,處理器12自多個主成分中挑選一個或更多個第一主成分,並依據這第一主成分正規化特徵集合。例如,處理器12設定區間的最大值及最小值,並對各個主成分進行歸一處理,以讓彼此的基準點一致。
In one embodiment, the
在一實施例中,第一主成分是那些主成分中占比最高的主成分。例如,圖3的主成分PC1的占比遠大於其他主成分PC2~PC11,故可選擇主成分PC1進行後續正規化處理。 In one embodiment, the first principal component is the principal component with the highest proportion among those principal components. For example, the proportion of principal component PC1 in Figure 3 is much larger than that of other principal components PC2~PC11, so principal component PC1 can be selected for subsequent normalization processing.
在另一實施例中,第一主成分為那些主成分中占比最高的主成分或占比第二高的主成分。占比最高的主成分與占比第二高的主成分之間的差距小於門檻值(例如,3、5或10%)。例如,占比最高的主成分與占比第二高的主成分之間的差距在5%以內,則一併考慮選擇占比第二高的主成分。而若尚有其他占比排序的主成分與占比最高的主成分之間的差異也小於門檻值,則也會一併納入後續正規化處理的考量。 In another embodiment, the first principal component is the principal component with the highest proportion or the second highest proportion among the principal components. The difference between the principal component with the highest proportion and the principal component with the second highest proportion is less than a threshold value (e.g., 3, 5, or 10%). For example, if the difference between the principal component with the highest proportion and the principal component with the second highest proportion is within 5%, the principal component with the second highest proportion is considered and selected. If there are other principal components ranked by proportion whose differences with the principal component with the highest proportion are also less than the threshold value, they will also be considered for subsequent regularization.
在一實施例中,處理器12可透過百分比轉換(Percentile transformation)排序那些特徵集合。也就是將特徵數值轉換成排名。例如,表(2)是一個特徵集合的特徵:表(2)
圖4是依據本發明一實施例的主成分(Principal Component)分布圖。請參照圖4,水平軸為特徵並依據標號排列,且垂直軸為不同受測者的編號。不同受測者的特徵不盡相同。例如,編號5和10的受測者的第17個及第18個特徵的表現(即,重要程度,並在圖中以不同灰階表示)不同於其他受測者。 FIG4 is a principal component distribution diagram according to an embodiment of the present invention. Referring to FIG4, the horizontal axis is the features and is arranged according to the labels, and the vertical axis is the numbers of different subjects. The features of different subjects are not the same. For example, the performance (i.e., the importance, and represented by different gray levels in the figure) of the 17th and 18th features of subjects numbered 5 and 10 are different from those of other subjects.
請參照圖2,處理器12產生那些正規化特徵集合的距離關係(步驟S230)。具體而言,距離關係包括那些正規化特徵集合中的兩者之間的距離。處理器12可將正規化特徵集合中的特徵投影到相同空間並形成坐標,並計算不同正規化特徵集合的特徵之間在空間中的距離(即,兩個坐標之間的距離)。
Referring to FIG. 2 , the
在一實施例中,距離關係是距離矩陣(distance matrix),且距離矩陣中的每一元素是兩個正規化特徵集合中的特徵之間的距離。距離演算法可以是歐幾里得距離、餘弦相似性或KL偏差(Kullback-Leibler Divergence)。例如,第一正規化特徵集合為[1.5,2.2],第二正規化特徵集合為[0.1,1.6],且第三正規化特徵集合為 [5.7,4.3]。距離矩陣為[1.52,4.7,6.22],其中以歐幾里得距離為例,(1.5-0.1)^2+(2.2-1.6)^2取開根號得出1.52,其餘依此類推。 In one embodiment, the distance relationship is a distance matrix, and each element in the distance matrix is the distance between features in two normalized feature sets. The distance algorithm can be Euclidean distance, cosine similarity, or KL divergence. For example, the first normalized feature set is [1.5, 2.2], the second normalized feature set is [0.1, 1.6], and the third normalized feature set is [5.7, 4.3]. The distance matrix is [1.52, 4.7, 6.22]. Taking the Euclidean distance as an example, taking the square root of (1.5-0.1)^2+(2.2-1.6)^2 gives 1.52, and the rest are similar.
距離關係不限於矩陣形式,在其他實施例中,距離關係也可能是對照表、數學轉換式或其他記錄不同特徵集合之間的距離的關係。 The distance relationship is not limited to the matrix form. In other embodiments, the distance relationship may also be a comparison table, a mathematical conversion formula, or other relationship that records the distances between different feature sets.
請參照圖2,處理器12依據距離關係分群那些特徵集合,以產生多個資料群組(步驟S240)。具體而言,每一資料群組包括一個或更多個特徵集合。由距離關係可以得知不同特徵集合之間的相似性。而分群是將相似性越高的多個特徵集合分配到相同資料群組。分群法可以是K平均分群(K-Means Clustering)、階層分群法(Hierarchical Clustering)或模糊分群法(Fuzzy clustering)。
Please refer to FIG. 2 , the
舉例而言,圖5是依據本發明一實施例的階層分群法(Hierarchical Clustering)的分群示意圖。請參照圖5,各特徵集合編號對應於一個特徵集合(例如,一個受測者)。處理器12利用階層分群法(Hierarchical Clustering)將距離最接近的特徵集合分群為那些資料群組其中之一。特徵集合在編號28及16(例如,編號16及28的受測者對應的特徵集合)之間可拆分兩個群組。須說明的是,最接近是指與距離門檻值比較的結果。若兩特徵集合之間的距離小於距離門檻值,則視為最接近的兩特徵集合;反之,則視為不接近的兩特徵集合。
For example, FIG5 is a schematic diagram of clustering according to a hierarchical clustering method according to an embodiment of the present invention. Referring to FIG5, each feature set number corresponds to a feature set (e.g., a subject). The
在一實施例中,處理器12可決定該些資料群組的群組數量,依據群組數量決定分群距離,並依據這分群距離分群那些特徵
集合。以圖5為例,群組數量為2,則分群距離為60。而編號5、12、11、27、19、23、30、3、28的特徵集合(例如,受試者編號5、12、11、27、19、23、30、3、28對應的特徵集合)之間的距離在60內,故皆被分配到相同資料群組。群組數量為3,則分群距離為50。而編號16、24、10、15、29的特徵集合之間的距離在50內,故皆被分配到相同資料群組。
In one embodiment, the
請參照圖2,處理器12分別使用那些資料群組訓練多個機器學習模型(步驟S250)。具體而言,得出分群結果之後,每一資料群組可與其他資料群組分開訓練。那些機器學習模型分別使用不同資料群組所訓練。處理器12可使用每一資料群組對應的那些特徵集合(也就是,從感測資料轉換的特徵集合)或未經轉換的那些感測資料訓練對應的機器學習模型。例如,使用第一資料群組的特徵集合訓練第一機器學習模型,且使用第二資料群組的特徵集合訓練第二機器學習模型。而第一資料群組不會用於訓練第二機器學習模型。此外,機器學習演算法可以是深度學習(Deep Learning)、決策樹(Decision Tree)、循環神經網路(Recurrent Neural Network,RNN)或其他演算法。
Please refer to FIG. 2 , the
以下驗證結果可證明本發明實施例的分群訓練有助於機器學習的訓練。 The following verification results can prove that the clustering training of the embodiment of the present invention is helpful for the training of machine learning.
圖6是依據本發明一實施例的第一群組分別訓練的驗證結果的示意圖,圖7是依據本發明一實施例的第二群組分別訓練的驗證結果的示意圖,且圖8是依據本發明一實施例的多群組共 同訓練的驗證結果的示意圖。請參照圖6、圖7及圖8,準確度是透過各機器學習模型所得出的多筆預測結果與實際結果的正確率。敏感度是實際為陽性的樣本中判斷為陽性的比例。專一性是實際為陰性的樣本中判斷為陰性的比例。 FIG6 is a schematic diagram of the verification results of the first group training separately according to an embodiment of the present invention, FIG7 is a schematic diagram of the verification results of the second group training separately according to an embodiment of the present invention, and FIG8 is a schematic diagram of the verification results of the multi-group joint training according to an embodiment of the present invention. Please refer to FIG6, FIG7 and FIG8, accuracy is the correctness of multiple prediction results obtained by each machine learning model and the actual results. Sensitivity is the proportion of samples judged as positive in actual positive samples. Specificity is the proportion of samples judged as negative in actual negative samples.
圖6是使用圖5中的編號5、12、11、27、19、23、30、3、28的特徵集合(例如,受試者編號5、12、11、27、19、23、30、3、28對應的特徵集合或原始的感測資料)訓練的驗證結果。圖7是使用圖5中的其他特徵集合(例如,受試者編號5、12、11、27、19、23、30、3、28以外的其他受試者對應的特徵集合或原始的感測資料)訓練的驗證結果。圖8是使用圖5中所有特徵集合(例如,所有受試者對應的特徵集合或原始的感測資料)共同訓練的驗證結果。分群訓練(對應於圖6、圖7)在準確度、敏感度及專一性都優於共同訓練(對應於圖8)。以準確度為例,在共同訓練下,圖8所示的準確度大約收斂在0.7。而在分群訓練下,圖6、圖7所示的準確度可收斂在0.7以上。甚至,僅針對相同資料群組的圖6的準確度可大約收斂在0.9。
FIG6 is a verification result of training using the feature sets numbered 5, 12, 11, 27, 19, 23, 30, 3, 28 in FIG5 (e.g., feature sets corresponding to
除了訓練優化,本發明實施例可優化模型預測。圖9是依據本發明一實施例的資料預測的流程圖。請參照圖9,處理器12可決定待預測資料與那些資料群組之間的距離(步驟S910)。具體而言,處理器12可先取得待預測資料。待預測資料可參酌前述感測資料的說明,於此不再贅述。視需求,處理器12將待預測資料轉換成待預測特徵集合。特徵轉換的說明可參酌前述感測資料至
特徵集合的轉換,於此不再贅述。接著,處理器12決定待預測特徵集合與多個資料群組之間的距離。
In addition to training optimization, the embodiment of the present invention can optimize model prediction. FIG. 9 is a flow chart of data prediction according to an embodiment of the present invention. Referring to FIG. 9, the
舉例而言,第一資料群組的代表數值(例如,平均值、中位數或其他統計值)為[8.16,9.8,3.7,15.54,2.74,4.04,16.82,4.56,21.,11.88,12.78,11.1,9.54,7.22,7.24,18.34,17.04,4.24,20.,12.1,13.16],第二資料群組的代表數值為[4.61,6.42,9.95,5.7,4.,6.61,2.85,10.28,21,15.85,14.66,12.047,8.28,10.38,9.95,18.85,16.42,3.57,20,13.33,16.09],且待預測特徵集合為[10,13,6,16,2,3,17,5,21,9,15,12,8,7,4,19,18,1,20,11,14]。以歐幾里得距離為例,待預測特徵集合與第一資料群組之間的距離為7.855,且待預測特徵集合與第二資料群組之間的距離為23.495。 For example, the representative values (e.g., mean, median, or other statistical values) of the first data group are [8.16, 9.8, 3.7, 15.54, 2.74, 4.04, 16.82, 4.56, 21., 11.88, 12.78, 11.1, 9.54, 7.22, 7.24, 18.34, 17.04, 4.24, 20., 12.1, 13.16], and the representative values of the second data group are [4.61, 6.4 2,9.95,5.7,4.,6.61,2.85,10.28,21,15.85,14.66,12.047,8.28,10.38,9.95,18.85,16.42,3.57,20,13.33,16.09], and the feature set to be predicted is [10,13,6,16,2,3,17,5,21,9,15,12,8,7,4,19,18,1,20,11,14]. Taking the Euclidean distance as an example, the distance between the feature set to be predicted and the first data group is 7.855, and the distance between the feature set to be predicted and the second data group is 23.495.
處理器12可從多個機器學習模型中選擇與待預測資料之間具有最短距離的資料群組對應的第一機器學習模型(步驟S920),並利用這第一機器學習模型對待預測資料進行預測(步驟S930)。例如,7.855小於23.495,因此與待預測特徵集合之間具有最短距離的資料群組為第一資料群組。處理器12可載入第一資料群組的第一機器學習模型,並輸入待預測資料至所載入的第一機器學習模型,以預測結果。預測資料以雷達的感測結果為例,則預測結果可以是睡眠事件。然而,預測結果仍可依據實際需求而改變。
The
須說明的是,在一實施例中,反應於多個資料群組與待預測資料之間的距離小於距離下限或大於距離上限,則這些資料群
組的機器學習模型皆可能受選用於預測待預測資料的結果。在另一實施例中,反應於多個資料群組與待預測資料之間的距離皆相同、或距離小於預設值,則處理器12可載入多資料群組共同訓練的機器學習模型進行預測。
It should be noted that in one embodiment, in response to the distance between multiple data groups and the data to be predicted being less than the lower limit of the distance or greater than the upper limit of the distance, the machine learning models of these data groups may all be selected to predict the results of the data to be predicted. In another embodiment, in response to the distance between multiple data groups and the data to be predicted being the same or less than a preset value, the
綜上所述,在本發明實施例的資料預測方法及裝置中,依據維度縮減的結果正規化特徵集合,並進一步分群。接著,使用不同資料群組訓練不同機器學習模型。此外,選擇距離相近的資料群組對應的機器學習模型進行預測。藉此,可提升訓練及預測的效果。 In summary, in the data prediction method and device of the embodiment of the present invention, the feature set is normalized according to the result of dimension reduction and further grouped. Then, different machine learning models are trained using different data groups. In addition, machine learning models corresponding to data groups with similar distances are selected for prediction. In this way, the effect of training and prediction can be improved.
雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。 Although the present invention has been disclosed as above by the embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the relevant technical field can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention shall be subject to the scope of the attached patent application.
S910~S920:步驟 S910~S920: Steps
Claims (20)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/083,593 US20230409927A1 (en) | 2022-06-16 | 2022-12-19 | Data predicting method and apparatus |
| EP23154375.2A EP4293573A1 (en) | 2022-06-16 | 2023-02-01 | Data predicting method and apparatus |
| JP2023027824A JP7561477B2 (en) | 2022-06-16 | 2023-02-24 | Data prediction method and apparatus |
| US18/190,125 US20230404474A1 (en) | 2022-06-16 | 2023-03-27 | Evaluation method of sleep quality and computing apparatus related to sleep quality |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202263352644P | 2022-06-16 | 2022-06-16 | |
| US63/352,644 | 2022-06-16 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW202401306A TW202401306A (en) | 2024-01-01 |
| TWI865935B true TWI865935B (en) | 2024-12-11 |
Family
ID=89135642
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW111137595A TWI865935B (en) | 2022-06-16 | 2022-10-03 | Data processing method and apparatus |
| TW112101793A TWI849690B (en) | 2022-06-16 | 2023-01-16 | Evaluation method of sleep quality and computing apparatus related to sleep quality |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW112101793A TWI849690B (en) | 2022-06-16 | 2023-01-16 | Evaluation method of sleep quality and computing apparatus related to sleep quality |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN117252268A (en) |
| TW (2) | TWI865935B (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW201725526A (en) * | 2015-09-30 | 2017-07-16 | 伊佛曼基因體有限公司 | Systems and methods for predicting treatment-regimen-related outcomes |
| TW201807624A (en) * | 2016-07-29 | 2018-03-01 | 美商鄧白氏公司 | Diagnostic engine and classifier for discovery of behavioral and other clusters relating to entity relationships to enhance derandomized entity behavior identification and classification |
| CN111210023A (en) * | 2020-01-13 | 2020-05-29 | 哈尔滨工业大学 | Automatic selection system and method for data set classification learning algorithm |
| CN111783093A (en) * | 2020-06-28 | 2020-10-16 | 南京航空航天大学 | A Soft Dependency-Based Malware Classification and Detection Method |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW201223505A (en) * | 2010-12-14 | 2012-06-16 | Univ Nat Cheng Kung | Medical analysis method and apparatus |
| TW201612845A (en) * | 2014-08-28 | 2016-04-01 | Resmed Ltd | Method, system and apparatus for diagnosis, monitoring and treatment of respiratory disorders |
| CN108065916B (en) * | 2017-12-14 | 2021-04-09 | 中国人民解放军国防科技大学 | A non-contact sleep quality monitoring method based on bioradar |
| JP7202385B2 (en) * | 2017-12-22 | 2023-01-11 | レスメッド センサー テクノロジーズ リミテッド | Devices, systems and methods for health and medical sensing |
| WO2020104465A2 (en) * | 2018-11-19 | 2020-05-28 | Resmed Sensor Technologies Limited | Methods and apparatus for detection of disordered breathing |
| CN109480783B (en) * | 2018-12-20 | 2022-02-18 | 深圳和而泰智能控制股份有限公司 | Apnea detection method and device and computing equipment |
| CN115308734A (en) * | 2019-12-26 | 2022-11-08 | 华为技术有限公司 | Respiratory data calculation method and related equipment |
-
2022
- 2022-10-03 TW TW111137595A patent/TWI865935B/en active
- 2022-10-28 CN CN202211333180.6A patent/CN117252268A/en active Pending
-
2023
- 2023-01-16 TW TW112101793A patent/TWI849690B/en active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW201725526A (en) * | 2015-09-30 | 2017-07-16 | 伊佛曼基因體有限公司 | Systems and methods for predicting treatment-regimen-related outcomes |
| TW201807624A (en) * | 2016-07-29 | 2018-03-01 | 美商鄧白氏公司 | Diagnostic engine and classifier for discovery of behavioral and other clusters relating to entity relationships to enhance derandomized entity behavior identification and classification |
| CN111210023A (en) * | 2020-01-13 | 2020-05-29 | 哈尔滨工业大学 | Automatic selection system and method for data set classification learning algorithm |
| CN111783093A (en) * | 2020-06-28 | 2020-10-16 | 南京航空航天大学 | A Soft Dependency-Based Malware Classification and Detection Method |
Also Published As
| Publication number | Publication date |
|---|---|
| TWI849690B (en) | 2024-07-21 |
| TW202401306A (en) | 2024-01-01 |
| TW202400076A (en) | 2024-01-01 |
| CN117252268A (en) | 2023-12-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111400543B (en) | Audio fragment matching method, device, equipment and storage medium | |
| Ozcift | SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease | |
| CN114255830B (en) | Classification method and related equipment for incremental medical data based on rule classifier | |
| CN111626340B (en) | A classification method, device, terminal and computer storage medium | |
| CN115116472B (en) | Audio recognition method, device, equipment and storage medium | |
| CN111631682B (en) | Physiological feature integration method, device and computer equipment based on detrending analysis | |
| CN110709862A (en) | Calculation method determination system, calculation method determination device, processing device, calculation method determination method, processing method, calculation method determination program, and processing program | |
| CN110544468A (en) | Application wake-up method, device, storage medium and electronic device | |
| Hu et al. | Auditory receptive field net based automatic snore detection for wearable devices | |
| CN108805000A (en) | Electronic device, the music score recognition method based on deep learning and storage medium | |
| TWI865935B (en) | Data processing method and apparatus | |
| Tran et al. | Optimal CNN Model for Obstructive Sleep Apnea Detection using Particle Swarm Optimization | |
| JP7561477B2 (en) | Data prediction method and apparatus | |
| TWI715250B (en) | Feature identifying method and electronic device | |
| JP2019212034A5 (en) | ||
| CN111179691A (en) | Note duration display method and device, electronic equipment and storage medium | |
| Ching et al. | Full model for sensors placement and activities recognition | |
| JP2024078653A (en) | Image processing device, image processing method, and recording medium | |
| CN116910544A (en) | A target signal enhancement method and system based on signal similarity calculation | |
| CN115493690A (en) | Underwater acoustic signal feature extraction method based on spread Lempel-Ziv entropy | |
| US20230404474A1 (en) | Evaluation method of sleep quality and computing apparatus related to sleep quality | |
| JP2022151502A (en) | Program, information processing device and method | |
| TW202133186A (en) | Training data processing method and electronic device | |
| JP2022129136A (en) | Processing device, processing method and program | |
| RU114194U1 (en) | PERSONALITY IDENTIFICATION DEVICE |