[go: up one dir, main page]

TWI845355B - System for judging input mode of form data - Google Patents

System for judging input mode of form data Download PDF

Info

Publication number
TWI845355B
TWI845355B TW112123637A TW112123637A TWI845355B TW I845355 B TWI845355 B TW I845355B TW 112123637 A TW112123637 A TW 112123637A TW 112123637 A TW112123637 A TW 112123637A TW I845355 B TWI845355 B TW I845355B
Authority
TW
Taiwan
Prior art keywords
judgment
form data
data
judged
input
Prior art date
Application number
TW112123637A
Other languages
Chinese (zh)
Other versions
TW202501334A (en
Inventor
陳維超
張明淇
魏智斌
黃敬倫
藍祥予
Original Assignee
英業達股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 英業達股份有限公司 filed Critical 英業達股份有限公司
Priority to TW112123637A priority Critical patent/TWI845355B/en
Application granted granted Critical
Publication of TWI845355B publication Critical patent/TWI845355B/en
Publication of TW202501334A publication Critical patent/TW202501334A/en

Links

Images

Landscapes

  • Input From Keyboards Or The Like (AREA)

Abstract

A system for judging input mode of form data is configured to extract a study-use field information and a study-use volume of time stamp from each form data having a ground truth of being manually input or automatically input to according execute a learning calculation to generate a judgment calculation model; further to extract an under-judged field information and a under-judged volume of time stamp from each under-judged form data without the ground truth to accordingly generate a judgment result for predicting that the under-judged form data is manually input or automatically input; and further to define abnormal judgment form data and a trace back ground truth to re-execute the learning calculation to revise the judgment calculation model when the judgment result does not comply with a feedback ground truth.

Description

表單資料輸入方式判斷系統Form data input method judgment system

本發明係有關於一種判斷系統,尤其是指一種用於判斷表單資料之輸入方式為手動輸入或自動輸入之判斷系統。The present invention relates to a judgment system, and more particularly to a judgment system for judging whether the input method of form data is manual input or automatic input.

因應組織管理、系統管理與資訊整合管理等多方面的需要,越來越多的表單資料需要被傳送至資料管理中心的資料儲存裝置(伺服器)進行集中式的管理,以作為對特定事物進行分析的數據基礎。這些表單資料有一部分是藉由打字輸入、點選選項輸入或者手寫配合影像辨識技術等手動輸入方式輸出資料後所產生的;其他部分可能是藉由讀取條碼、晶片感應、標籤感應、影像辨識或系統自動帶入或匯入等自動輸入方式輸入資料後所產生的。In response to the needs of organizational management, system management, and information integration management, more and more form data needs to be sent to the data storage device (server) of the data management center for centralized management, so as to serve as the data basis for analyzing specific things. Some of these form data are generated by manual input methods such as typing, clicking on options, or handwriting combined with image recognition technology; other parts may be generated by automatic input methods such as reading barcodes, chip sensing, label sensing, image recognition, or automatic system import or import.

為了精確判讀並分析出所有的表單資料所呈現出來的意義,必須藉由大數據演算進行精確的統計分析。此作法的其前提為表單資料的內容本身必須具備極高的正確性,以免誤判表單資料的所呈現出來的真實意義。然而,藉由人員打字、書寫或點選等手動輸入方式輸入資料時所發生的錯誤率,往往遠高於讀取條碼、晶片感應、標籤感應、影像辨識或系統自動帶入或匯入等自動輸入方式輸入資料時所發生的錯誤率。In order to accurately judge and analyze the meaning of all the form data, accurate statistical analysis must be performed through big data calculations. The premise of this approach is that the content of the form data itself must have extremely high accuracy to avoid misjudging the true meaning of the form data. However, the error rate when entering data manually by typing, writing or clicking is often much higher than the error rate when entering data automatically by reading barcodes, chip sensing, label sensing, image recognition or system automatic import or import.

由於表單資料數量極為龐大,由資料管理中心的工作人員逐一核對確認表單資料內容是否正確的作法相當不切實際。只能藉由輔助性的檢驗工具軟體來輔助,即便如此,由於表單資料具備相當高的多元性,不太可能為所有的表單資料都分別開發出對應的檢驗工具軟體來逐一進行檢驗。Due to the huge amount of form data, it is impractical for the staff of the data management center to check and confirm whether the form data content is correct one by one. It can only be assisted by auxiliary verification tool software. Even so, due to the high diversity of form data, it is impossible to develop corresponding verification tool software for all form data to conduct verification one by one.

若表單資料能夠自動化產生,就可大幅減輕資料管理中心的工作人員的工作負擔。因此,推動表單資料自動化輸入產生的工作就勢在必行。然而,因應各種不同的使用情境與使用需求,實際上並非所有表單都可以在短時間內全部轉變為利用自動輸入資料的方式產生,所以仍難免會有部分之表單資料是藉由手動輸入資料的方式產生。If form data can be automatically generated, the workload of data management center staff can be greatly reduced. Therefore, it is imperative to promote the automatic input of form data. However, due to various usage scenarios and usage requirements, not all forms can be converted to automatic data input in a short period of time, so it is inevitable that some form data will still be generated by manual data input.

由於資料管理中心的資料儲存裝置(伺服器)所儲存的表單資料中,夾雜了部分自動輸入與部分手輸入資料方式所產生的表單資料,因此,實在有必要開發出一種新的判斷技術來判斷出哪些表單資料是自動輸入資料產生的,哪些又是手動輸入資料產生的。藉此,可將更多的檢驗資源投注於對手動輸入資料產生的表單資料進行檢驗,藉以提升表單資料整體的正確率。Since the form data stored in the data storage device (server) of the data management center is mixed with some form data generated by automatic input and some form data generated by manual input, it is necessary to develop a new judgment technology to judge which form data is generated by automatic input and which is generated by manual input. In this way, more inspection resources can be invested in the inspection of form data generated by manual input, so as to improve the overall accuracy of form data.

有鑒於在先前技術中,缺乏用以判斷表單資料是自動輸入或手動輸入資料產生的判斷技術,因而無法將檢驗資源集中投注於對手動輸入資料產生的表單資料進行檢驗,導致不易提升表單資料整體的正確率問題;本發明為解決先前技術之問題所採用之其中一種必要技術手段為提供一種表單資料輸入方式判斷系統(以下簡稱「判斷系統」),且判斷系統包含一資料儲存裝置與一判斷裝置。In view of the fact that in the prior art, there is a lack of judgment technology for judging whether form data is automatically input or manually input, and therefore it is impossible to concentrate inspection resources on the form data generated by manually input data, resulting in the difficulty in improving the overall accuracy of the form data. One of the necessary technical means adopted by the present invention to solve the problems of the prior art is to provide a form data input method judgment system (hereinafter referred to as "judgment system"), and the judgment system includes a data storage device and a judgment device.

資料儲存裝置係儲存有複數個基準真相表單資料與複數個待判斷表單資料,該些基準真相表單資料係對應地具有用以定義為自動輸入或手動輸入之複數個初始基準真相。判斷裝置係通信連結於資料儲存裝置以擷取基準真相表單資料與待判斷表單資料,並且在安裝與執行一判斷程式後產生一特徵擷取模組、一監督式學習模組、一判斷模組與一驗證示警模組。The data storage device stores a plurality of reference truth form data and a plurality of to-be-judged form data, and the reference truth form data correspondingly have a plurality of initial reference truths defined as automatically input or manually input. The judgment device is communicatively connected to the data storage device to capture the reference truth form data and the to-be-judged form data, and generates a feature capture module, a supervised learning module, a judgment module, and a verification alarm module after installing and executing a judgment program.

特徵擷取模組係自每一基準真相表單資料中擷取反應資料亂度之一學習用欄位資訊量與一學習用時間戳記量,使基準真相表單資料具有對應之複數個上述之學習用欄位資訊量與複數個上述之學習用時間戳記量,並自每一待判斷表單資料中擷取反應資料亂度之一判斷用欄位資訊量與一判斷用時間戳記量。The feature extraction module extracts a learning field information quantity and a learning timestamp quantity reflecting the data disorder from each reference truth form data, so that the reference truth form data has a corresponding plurality of the above-mentioned learning field information quantities and a plurality of the above-mentioned learning timestamp quantities, and extracts a judgment field information quantity and a judgment timestamp quantity reflecting the data disorder from each form data to be judged.

監督式學習模組係依據基準真相表單資料所對應之學習用欄位資訊量與學習用時間戳記量與初始基準真相,進行一學習演算而產生一判斷演算模型。The supervised learning module performs a learning calculation based on the learning field information amount and the learning time stamp amount corresponding to the benchmark truth table data and the initial benchmark truth to generate a judgment calculation model.

判斷模組係依據判斷演算模型與每一待判斷表單資料所對應之判斷用欄位資訊量與判斷用時間戳記量,判斷每一待判斷表單資料以對應產生用以預測每一待判斷表單資料為自動輸入或手動輸入之一判斷結果,據以產生複數個判斷結果。The judgment module judges each form data to be judged based on the judgment calculation model and the judgment field information amount and judgment timestamp amount corresponding to each form data to be judged to generate a judgment result for predicting whether each form data to be judged is automatically input or manually input, thereby generating multiple judgment results.

驗證示警模組係接收用以定義待判斷表單資料為自動輸入或手動輸入之複數個回饋基準真相,並在驗證出待判斷表單資料所對應之判斷結果與回饋基準真相不符時,發出一示警信息,據以定義出一判斷異常表單資料與一追認基準真相。驗證示警模組更將判斷異常表單資料與追認基準真相儲存至資料儲存裝置以作為基準真相表單資料與初始基準真相,藉以供監督式學習模組重新進行學習演算而修改判斷演算模型。The verification alarm module receives a plurality of feedback reference truths for defining the form data to be judged as automatically input or manually input, and issues a warning message when it is verified that the judgment result corresponding to the form data to be judged is inconsistent with the feedback reference truth, thereby defining a judgment abnormal form data and a confirmed reference truth. The verification alarm module further stores the judgment abnormal form data and the confirmed reference truth in the data storage device as the reference truth form data and the initial reference truth, so as to provide the supervised learning module with a new learning calculation to modify the judgment calculation model.

在上述必要技術手段的基礎下,所衍生出之附屬技術手段中,較佳者,判斷系統更包含複數個資料輸入終端裝置,且基準真相表單資料與待判斷表單資料可由資料輸入終端裝置傳送至資料儲存裝置加以儲存。資料儲存裝置可為一資料儲存伺服器,判斷裝置可為一運算伺服器。每一資料輸入終端裝置更包含一回饋操作介面,以供每一資料輸入終端裝置之一操作者在驗證出待判斷表單資料所對應之判斷結果錯誤時,對應地輸入回饋基準真相。Among the subsidiary technical means derived from the above necessary technical means, preferably, the judgment system further includes a plurality of data input terminal devices, and the reference truth form data and the form data to be judged can be transmitted from the data input terminal device to the data storage device for storage. The data storage device can be a data storage server, and the judgment device can be a computing server. Each data input terminal device further includes a feedback operation interface, so that an operator of each data input terminal device can input the feedback reference truth accordingly when verifying that the judgment result corresponding to the form data to be judged is wrong.

較佳者,驗證示警模組更包含一驗證週期設定介面,以供設定一驗證週期,藉以依據驗證週期而週期性地將判斷異常表單資料與追認基準真相作為基準真相表單資料與初始基準真相,以供監督式學習模組依據該驗證週期而週期性地進行該學習演算。Preferably, the verification alarm module further includes a verification cycle setting interface for setting a verification cycle, so as to periodically determine abnormal form data and confirm the baseline truth as the baseline truth form data and the initial baseline truth according to the verification cycle, so that the supervised learning module can periodically perform the learning calculation according to the verification cycle.

特徵擷取模組可包含一欄位資訊量擷取單元,且欄位資訊量擷取單元係依據一欄位熵演算法加以運作而獲得學習用欄位資訊量與判斷用欄位資訊量,且欄位熵演算法係為 ,其中 ,k表示資料欄位類型的數量,表示共包含k種資料欄位類型, 表示該k種資料欄位類型中的第i種資料欄位類型的數量,其中i、k與 皆為自然數。 The feature extraction module may include a field information extraction unit, and the field information extraction unit operates according to a field entropy algorithm to obtain learning field information and judgment field information, and the field entropy algorithm is ,in , k represents the number of data field types, indicating that there are k types of data field types in total. represents the number of the i-th data field type among the k data field types, where i, k and All are natural numbers.

特徵擷取模組更可包含一時間戳記量擷取單元,且時間戳記量擷取單元供一使用者在p個欄位中指定q個時間戳記欄位,並依據一時間戳記量演算法擷取學習用時間戳記量與判斷用時間戳記量,且時間戳記量演算法係 ,其中 表示q個時間戳記欄位中第j個時間戳記欄位所對應的r列資料中所包含之相異資料內容種類數,其中j、 、p、q與r皆為自然數,且p>q。 The feature extraction module may further include a timestamp quantity extraction unit, and the timestamp quantity extraction unit allows a user to specify q timestamp fields in p fields, and extracts learning timestamp quantities and judgment timestamp quantities according to a timestamp quantity algorithm, and the timestamp quantity algorithm is ,in Indicates the number of different data content types contained in the r columns of data corresponding to the jth timestamp field among the q timestamp fields, where j, , p, q and r are all natural numbers, and p>q.

判斷模組可更包含一標記單元,且該標記單元係依據判斷結果對每一待判斷表單資料賦予一自動輸入標記或一手動輸入標記後,再儲存於資料儲存裝置。The judgment module may further include a marking unit, and the marking unit assigns an automatic input mark or a manual input mark to each form data to be judged according to the judgment result, and then stores it in the data storage device.

此外,針對學習演算部分,較佳者,學習演算可包含至少一基礎演訓練算法,且基礎訓練演算法包含K-最鄰近演算法(KNN)、支援向量機(SVM)演算法、決策樹(Decision Tree)演算法與回歸(Regression)演算法中之至少一者。更佳者,學習演算可再包含至少一擬合演算法,且擬合演算法包含隨機森林(Random Forest)演算法與極限梯度提升(XGBoost)中之至少一者。In addition, for the learning algorithm part, preferably, the learning algorithm may include at least one basic training algorithm, and the basic training algorithm includes at least one of the K-nearest neighbor algorithm (KNN), the support vector machine (SVM) algorithm, the decision tree algorithm and the regression algorithm. More preferably, the learning algorithm may further include at least one fitting algorithm, and the fitting algorithm includes at least one of the random forest algorithm and the extreme gradient boosting (XGBoost).

綜合以上所述,由於在本發明所提供之表單資料輸入方式判斷系統中,係特別依據長期對表單資料為自動輸入或手動輸入之間的關聯性與規則性之觀察結果,特別選擇與時間與資料亂度相關之欄位資訊量與時間戳記量,作為後續進行學習訓練與判斷之重要特徵依據,據此進行監督式學習訓練而在短時間內建立較高信心水準之判斷演算模型與獲得較高判斷準確率之判斷結果。In summary, in the form data input method judgment system provided by the present invention, based on the long-term observation results of the correlation and regularity between the form data automatically input or manually input, the field information amount and timestamp amount related to time and data chaos are specially selected as the important feature basis for subsequent learning training and judgment. Based on this, supervised learning training is carried out to establish a judgment calculation model with a higher confidence level and obtain a judgment result with a higher judgment accuracy in a short time.

進一步地,可藉由週期性進行判斷、驗證、示警與產生追認基礎真相的方式,修正錯誤的判斷結果,據以重新進行學習演算而修改判斷演算模型,藉此,不但可以達到自動判斷輸入方式之功效,更可以在較短的時間內大幅提升判斷準確度。在獲得判斷正確率較高的判斷結果後,更可進一步將檢驗資源(包含人員、設備與/或工具軟體等)集中投注於對手動輸入資料產生的表單資料進行檢驗,藉以進一步達到提升表單資料整體的正確率之功效。Furthermore, by periodically conducting judgment, verification, warning, and generating a basis for verification, the erroneous judgment results can be corrected, and the judgment calculation model can be modified by relearning the calculation. In this way, not only can the effect of automatically judging the input method be achieved, but also the judgment accuracy can be greatly improved in a shorter period of time. After obtaining a judgment result with a higher judgment accuracy rate, the inspection resources (including personnel, equipment, and/or tool software, etc.) can be further concentrated on the inspection of the form data generated by manually input data, so as to further achieve the effect of improving the overall accuracy of the form data.

由於本發明所提供之表單資料輸入方式判斷系統,可廣泛運用於判斷表單資料是藉由自動輸入或手動輸入的方式所產生,其應用層面相當廣闊,故在此不再一一贅述,僅列舉其中較佳的一個實施例來加以具體說明,且此實施例僅用以方便、明晰地輔助說明本發明實施例的目的與功效。Since the form data input method determination system provided by the present invention can be widely used to determine whether the form data is generated by automatic input or manual input, its application level is quite broad, so it will not be described one by one here, and only a preferred embodiment is listed to be specifically explained, and this embodiment is only used to conveniently and clearly assist in explaining the purpose and effect of the embodiment of the present invention.

請參閱第一圖,其係顯示本發明較佳實施例所提供之表單資料輸入方式判斷系統之功能方塊示意圖。如第一圖所示,一種表單資料輸入方式判斷系統(以下簡稱「判斷系統」)100,包含一資料儲存裝置1、一判斷裝置2與資料輸入終端裝置3a~3c。Please refer to the first figure, which is a functional block diagram of the form data input method determination system provided by the preferred embodiment of the present invention. As shown in the first figure, a form data input method determination system (hereinafter referred to as "determination system") 100 includes a data storage device 1, a determination device 2 and data input terminal devices 3a~3c.

資料儲存裝置1可為資料儲存伺服器。判斷裝置2可為一運算伺服器。資料輸入終端裝置3a~3c可為工作設備內建電腦、工業電腦、桌上型電腦、筆記型電腦、平板電腦或智慧型手機可輸入表單資料之終端裝置,分別具有資料輸入介面31a~31c,並且分別具有回饋操作介面32a~32c。資料輸入介面31a~31與回饋操作介面32a~32可以資料輸入終端裝置3a~3c執行特定程式後之程式操作頁面,也可以是連結到一網頁伺服器後,在資料輸入終端裝置3a~3c上所呈現之網路操作頁面。The data storage device 1 may be a data storage server. The judgment device 2 may be a computing server. The data input terminal devices 3a~3c may be a terminal device for inputting form data such as a built-in computer of a work device, an industrial computer, a desktop computer, a laptop computer, a tablet computer or a smart phone, and each of them has a data input interface 31a~31c and a feedback operation interface 32a~32c. The data input interface 31a~31 and the feedback operation interface 32a~32 may be a program operation page after the data input terminal device 3a~3c executes a specific program, or may be a network operation page displayed on the data input terminal device 3a~3c after connecting to a web server.

資料儲存裝置1儲存有複數個基準真相表單資料GF與複數個待判斷表單資料JF,且基準真相表單資料GF與待判斷表單資料JF可由資料輸入終端裝置3a~3c中之少一操作者利用資料輸入介面31a~31c中之少一者輸入後,被傳送至資料儲存裝置1加以儲存。基準真相表單資料GF係對應地具有用以定義為自動輸入(即藉由自動輸入資料方式所產生的表單資料)或手動輸入(即藉由手動輸入資料方式所產生的表單資料)之複數個初始基準真相。所謂的初始基準真相是指在利用判斷系統100對待判斷表單資料JF進行判斷前,已被證實過因而具備極高可信度的事實真相。待判斷表單資料JF則是有待判斷系統100判斷其為自動輸入或手動輸入的表單資料。The data storage device 1 stores a plurality of reference truth form data GF and a plurality of form data to be judged JF, and the reference truth form data GF and the form data to be judged JF can be input by at least one operator among the data input terminal devices 3a~3c using the data input interface 31a~31c, and then transmitted to the data storage device 1 for storage. The reference truth form data GF has a plurality of initial reference truths defined as automatically input (i.e., form data generated by automatically inputting data) or manually input (i.e., form data generated by manually inputting data). The so-called initial reference truth refers to the factual truth that has been verified and has a very high credibility before the form data to be judged JF is judged by the judgment system 100. The form data to be judged JF is the form data that the judgment system 100 judges to be automatically input or manually input.

舉例而言,如表一所示,資料儲存裝置1共存了10個基準真相表單資料,對應之序號分別為0001~0010,其中序號為0001與0005之基準真相表單資料之初始基準真相為藉由手動輸入方式輸入的,所謂的手動輸入方式可藉由打字輸入、點選選項輸入或手寫配合文字辨識軟體等手動方式輸入。為了便於人員與軟體識別與辯讀,可對序號為0001與0005之基準真相表單資料賦予「M」之標記,以代表為手動輸入。其餘8個基準真相表單資料之初始基準真相為藉由自動輸入方式輸入的,所謂的自動輸入方式可藉由讀取條碼、晶片感應、標籤感應、影像辨識或系統自動帶入或匯入等自動輸入方式輸入。相似地,可對自動輸入之基準真相表單資料賦予「A」之標記。For example, as shown in Table 1, data storage device 1 stores 10 reference truth table data, and the corresponding serial numbers are 0001 to 0010. The initial reference truth of the reference truth table data with serial numbers 0001 and 0005 is input by manual input, and the so-called manual input method can be input by typing, clicking on options, or handwriting combined with text recognition software. In order to facilitate personnel and software recognition and interpretation, the reference truth table data with serial numbers 0001 and 0005 can be given an "M" mark to represent manual input. The initial baseline truths of the remaining 8 baseline truth table data are input by automatic input, and the so-called automatic input method can be input by reading barcodes, chip sensing, label sensing, image recognition, or automatic system input or import. Similarly, the automatically input baseline truth table data can be given an "A" mark.

表一:基準真相表單資料列表 基準真相表單資料序號 初始基準真相 標記 0001 手動輸入 M 0002 自動輸入 A 0003 自動輸入 A 0004 自動輸入 A 0005 手動輸入 M 0006 自動輸入 A 0007 自動輸入 A 0008 自動輸入 A 0009 自動輸入 A 0010 自動輸入 A Table 1: List of benchmark truth table data Baseline truth table data serial number Initial Base Truth Mark 0001 Manual input M 0002 Automatic input A 0003 Automatic input A 0004 Automatic input A 0005 Manual input M 0006 Automatic input A 0007 Automatic input A 0008 Automatic input A 0009 Automatic input A 0010 Automatic input A

判斷裝置2係通信連結於資料儲存裝置1以擷取基準真相表單資料GF與待判斷表單資料JF,安裝有一判斷程式JAP,並在執行判斷程式JAP後,產生一特徵擷取模組21、一監督式學習模組22、一判斷模組23與一驗證示警模組24。The judgment device 2 is communicatively connected to the data storage device 1 to capture the reference truth form data GF and the form data to be judged JF, and is installed with a judgment program JAP. After executing the judgment program JAP, a feature capture module 21, a supervised learning module 22, a judgment module 23 and a verification alarm module 24 are generated.

經過長期的觀察,發明人發現自動輸入之表單資料與手動輸入之表單資料之間,普遍存在的關聯性與規則性包含: 1. 自動輸入方式可在短時間內輸入較大的資料量,所以在短時間內輸入的資料量較大者多為自動輸入; 2. 資料內容相似度高或重複性較高者多為自動輸入,資料內容相異度較高者或重複性較低者多為手動輸入; 3. 由於表單資料是依據欄位輸入資料,存在同欄位的多列資料之間多半具有相同的種類屬性(如時間、文字與數字等)的規則性;以及 4. 關於資料的相似度、相異度與重複性等特性,可以藉由與資料亂度相關的參數或指標加以體現。 After long-term observation, the inventors found that there are common correlations and regularities between automatically input form data and manually input form data, including: 1. Automatic input can input a large amount of data in a short time, so the data input in a short time is mostly automatically input; 2. Data with high similarity or high repetition is mostly automatically input, and data with high dissimilarity or low repetition is mostly manually input; 3. Since form data is input based on fields, there is a regularity that multiple rows of data in the same field mostly have the same type of attributes (such as time, text, and numbers, etc.); and 4. The characteristics of data such as similarity, dissimilarity and repetition can be reflected through parameters or indicators related to data noise.

依據發明人長期觀察與歸納的上述種種關聯性,為了提升判斷裝置2的判斷能力,應優先考慮擷取與時間以及資料亂度相關的參數或指標做為學習訓練與後續進行判斷的依據。在以上前提下,特徵擷取模組21包含一欄位資訊量擷取單元211與一時間戳記量擷取單元212。According to the above-mentioned various correlations observed and summarized by the inventors over a long period of time, in order to improve the judgment ability of the judgment device 2, it should be given priority to extract parameters or indicators related to time and data chaos as a basis for learning and training and subsequent judgment. Under the above premise, the feature extraction module 21 includes a field information quantity extraction unit 211 and a timestamp quantity extraction unit 212.

在學習階段時,欄位資訊量擷取單元211可自每一基準真相表單資料GF中擷取反應資料亂度之一學習用欄位資訊量,並在判斷每一待判斷表單資料JF是自動輸入或手動輸入時(即判斷階段時),自待判斷表單資料JF中擷取反應資料亂度之一判斷用欄位資訊量。During the learning phase, the field information extraction unit 211 can extract a learning field information quantity reflecting the data chaos from each reference truth form data GF, and when judging whether each form data to be judged JF is automatically input or manually input (i.e., during the judgment phase), extract a judgment field information quantity reflecting the data chaos from the form data to be judged JF.

欄位資訊量擷取單元212可依據一欄位熵演算法加以運作而獲得學習用欄位資訊量與判斷用欄位資訊量,且欄位熵演算法係為 ,其中 ,k表示資料欄位類型的數量,表示共包含k種資料欄位類型, 表示該k種資料欄位類型中的第i種資料欄位類型的數量,其中i、k與 皆為自然數。其中,由於欄位熵演算法是一個機率的對數函數,機率必然小於1,且小於1的數值取對數的結果必然為負值,因此必須加負號還原成正的數值。 The field information extraction unit 212 can be operated according to a field entropy algorithm to obtain the learning field information and the judgment field information, and the field entropy algorithm is ,in , k represents the number of data field types, indicating that there are k types of data field types in total. represents the number of the i-th data field type among the k data field types, where i, k and are all natural numbers. Since the field entropy algorithm is a logarithmic function of probability, the probability must be less than 1, and the logarithm of a number less than 1 must be a negative value, so a negative sign must be added to restore it to a positive value.

除了上述之欄位熵演算法之外,欄位資訊量擷取單元212也可擷取與其他與資料亂度相關參數,譬如資料重複率(具有相同資料的欄位數或資料總數或所占比例)或資料相似度(資料內容相同部分的比例)等作為學習用欄位資訊量與判斷用欄位資訊量。In addition to the above-mentioned field entropy algorithm, the field information extraction unit 212 can also extract other parameters related to data chaos, such as data repetition rate (the number of fields with the same data or the total number of data or the proportion) or data similarity (the proportion of the same part of the data content) as field information for learning and field information for judgment.

時間戳記量擷取單元212可提供一時間戳記指定介面(圖未示)以供一使用者在p個欄位中指定q個時間戳記欄位,在學習階段時依據一時間戳記量演算法擷取學習用時間戳記量,並在判斷階段時依據相同的時間戳記量演算法擷取判斷用時間戳記量。時間戳記量演算法可為 ,其中 表示q個時間戳記欄位中第j個時間戳記欄位所對應的r列資料中所包含之相異資料內容種類數,其中j、 、p、q與r皆為自然數,且p>q。 The timestamp quantity acquisition unit 212 may provide a timestamp designation interface (not shown) for a user to designate q timestamp fields among p fields, and to acquire learning timestamp quantities according to a timestamp quantity algorithm during the learning phase, and to acquire judgment timestamp quantities according to the same timestamp quantity algorithm during the judgment phase. The timestamp quantity algorithm may be ,in Indicates the number of different data content types contained in the r columns of data corresponding to the jth timestamp field among the q timestamp fields, where j, , p, q and r are all natural numbers, and p>q.

時間戳記欄位的資料內容雖然不一定必須是時間本身,但是最好跟時間有關,譬如可反應時間順序的序號或流水編號等。此外,雖然在本實施例中,時間戳記量演算法是取所有時間戳記欄位中相異資料內容種類數的最大值,在實際應用時,也可採用所有時間戳記欄位中相異資料內容種類數之算術平均數、中位數或眾數作為時間戳記量。Although the data content of the timestamp field does not necessarily have to be the time itself, it is best to be related to time, such as a sequence number or serial number that can reflect the time sequence. In addition, although in this embodiment, the timestamp quantity algorithm takes the maximum value of the number of different data content types in all timestamp fields, in actual application, the arithmetic mean, median or mode of the number of different data content types in all timestamp fields can also be used as the timestamp quantity.

關於(學習用或判斷用)欄位資訊量的擷取,舉例而言,如表二所示,一表單資料包含銷售日期、品牌、型號、數量、單價與銷售金額共6個欄位,其中,銷售日期屬於時間類欄位;品牌與型號屬於文字類欄位;數量、單價與銷售金額屬於數字類欄位。因此共有時間類欄位、文字類欄位與數字類欄位共3種欄位,表示上述的k為3;時間類欄位的數量為1,表示 為1;文字類欄位的數量為2,表示 為2;且數字類欄位的數量為3,表示 為3。 第一種欄位(時間類欄位)的機率 ,等於1/6; 第二種欄位(文字類欄位)的機率 ,等於2/6; 第三種欄位(數字類欄位)的機率 ,等於3/6; 帶入欄位熵演算法 ,可以獲得欄位資訊量為0.4392473。當此表單資料為上述之基準真相表單資料GF時,則表示學習用欄位資訊量為0.4392473;當此表單資料為上述之待判斷表單資料JF時,則表示判斷用欄位資訊量為0.4392473。 Regarding the extraction of field information (for learning or judgment), for example, as shown in Table 2, a form data contains 6 fields: sales date, brand, model, quantity, unit price and sales amount. Among them, sales date belongs to the time field; brand and model belong to the text field; quantity, unit price and sales amount belong to the number field. Therefore, there are 3 types of fields in total, namely time field, text field and number field, indicating that the above k is 3; the quantity of the time field is 1, indicating that is 1; the number of text fields is 2, indicating is 2; and the number of numeric fields is 3, indicating is 3. The probability of the first type of field (time type field) , which is equal to 1/6; The probability of the second field (text field) , which is equal to 2/6; The probability of the third field (numeric field) , equal to 3/6; Bring in the field entropy algorithm , the amount of field information can be obtained to be 0.4392473. When the form data is the above-mentioned benchmark truth form data GF, the amount of field information for learning is 0.4392473; when the form data is the above-mentioned to-be-judged form data JF, the amount of field information for judgment is 0.4392473.

關於(學習用或判斷用)時間戳記量的擷取,舉例而言,如表二所示,承以上所述,共有6個欄位代表P等於6,這6個欄位中只有銷售日期與時間相關,因此,使用者可藉由時間戳記量擷取單元212所提供一時間戳記指定介面(圖未示)指定「銷售日期」欄位為時間戳記欄位,表示q為1,表中的時間戳記欄位(「銷售日期」欄位)共有14列資料,表示r為14,唯一一個時間戳記欄位(「銷售日期」欄位)中的14列資料中只有「3月26日」、「3月27日」與「3月28日」這3種相異資料內容,代表相異資料內容種類數為3,也就是 為3。因為只有唯一一個時間戳記欄位,即q為1,所以 即為3,表示時間戳記量為3。當此表單資料為上述之基準真相表單資料GF時,則表示學習用時間戳記量為3;當此表單資料為上述之待判斷表單資料JF時,則表示判斷用時間戳記量為3。 Regarding the extraction of timestamp quantity (for learning or judgment), for example, as shown in Table 2, as mentioned above, there are 6 fields representing P equal to 6, and among these 6 fields, only the sales date is related to time. Therefore, the user can specify the "sales date" field as the timestamp field through a timestamp designation interface (not shown) provided by the timestamp quantity extraction unit 212. bit, indicating that q is 1. The timestamp field (the "Sales Date" field) in the table has 14 rows of data, indicating that r is 14. The only timestamp field (the "Sales Date" field) has 14 rows of data with only three different data contents: "March 26", "March 27" and "March 28", indicating that the number of different data content types is 3, that is, is 3. Because there is only one timestamp field, q is 1, so That is 3, indicating that the number of timestamps is 3. When this form data is the above-mentioned reference truth form data GF, it means that the number of timestamps for learning is 3; when this form data is the above-mentioned to-be-judged form data JF, it means that the number of timestamps for judgment is 3.

表二:表單資料 銷售日期 品牌 型號 數量 單價 銷售金額 3月26日 品牌A G655 47 1280 60160 3月26日 品牌B E222 87 1100 95700 3月26日 品牌C V990 35 880 30800 3月26日 品牌D R448 25 1430 35750 3月26日 品牌B E224 72 1077 77544 3月27日 品牌B E222 67 1100 73700 3月27日 品牌A G655 90 1280 115200 3月27日 品牌A G900 18 880 15840 3月27日 品牌C V980 42 999 41958 3月28日 品牌B E224 24 1077 25848 3月28日 品牌D U922 25 889 22225 3月28日 品牌C V980 33 999 32967 3月28日 品牌C V350 72 1372 98784 3月28日 品牌D R448 36 1430 51480 Table 2: Form data Sales Date brand Model quantity Unit Price Sales Amount March 26 Brand A G655 47 1280 60160 March 26 Brand B E222 87 1100 95700 March 26 Brand C V990 35 880 30800 March 26 Brand D R448 25 1430 35750 March 26 Brand B E224 72 1077 77544 March 27 Brand B E222 67 1100 73700 March 27 Brand A G655 90 1280 115200 March 27 Brand A G900 18 880 15840 March 27 Brand C V980 42 999 41958 March 28 Brand B E224 twenty four 1077 25848 March 28 Brand D U922 25 889 22225 March 28 Brand C V980 33 999 32967 March 28 Brand C V350 72 1372 98784 March 28 Brand D R448 36 1430 51480

雖然在以上的例子中,銷售日期、品牌、型號、數量、單價與銷售金額等6個欄位是橫向排列的6個欄位,但是在實務上,上述的欄位也可能是縱向排列的。當欄位是縱向排列時,上述的r列(橫向排列的)資料,可改以r行(縱向排列的)資料加以替代,所採用的特徵擷取方式,包含(學習用或判斷用)欄位資訊量的擷取以及(學習用或判斷用)時間戳記量也與以上描述內容相似,只是縱橫互換與行列互換而已,以下不再予以贅述。Although in the above example, the six fields of sales date, brand, model, quantity, unit price and sales amount are arranged horizontally, in practice, the above fields may also be arranged vertically. When the fields are arranged vertically, the above r columns (arranged horizontally) of data can be replaced by r rows (arranged vertically) of data, and the feature extraction method used, including the extraction of field information (for learning or judgment) and the extraction of timestamp (for learning or judgment) is similar to the above description, except that the vertical and horizontal fields and the rows and columns are interchanged, which will not be elaborated below.

監督式學習模組22係依據基準真相表單資料GF所對應之學習用欄位資訊量與學習用時間戳記量與初始基準真相(可以利用自動輸入標記A或手動輸入標記M加以代表),進行一學習演算而產生一判斷演算模型。針對學習演算部分,較佳者,學習演算可包含至少一基礎演訓練算法,且基礎訓練演算法包含K-最鄰近演算法(KNN)、支援向量機(SVM)演算法、決策樹(Decision Tree)演算法與回歸(Regression)演算法中之至少一者。更佳者,學習演算除了使用基礎演訓練算法之外,還可再包含至少一擬合演算法,且擬合演算法包含隨機森林(Random Forest)演算法與極限梯度提升(XGBoost)中之至少一者。判斷演算模型是指藉由進行上述的學習演算而自動推導所建立的數學演算模型。The supervised learning module 22 performs a learning calculation to generate a judgment calculation model based on the learning field information and learning time stamp corresponding to the benchmark truth table data GF and the initial benchmark truth (which can be represented by the automatic input label A or the manual input label M). For the learning calculation part, preferably, the learning calculation can include at least one basic training algorithm, and the basic training algorithm includes at least one of the K-nearest neighbor algorithm (KNN), support vector machine (SVM) algorithm, decision tree algorithm and regression algorithm. Preferably, the learning algorithm may include at least one fitting algorithm in addition to the basic training algorithm, and the fitting algorithm includes at least one of the Random Forest algorithm and the Extreme Gradient Boosting (XGBoost). The judgment algorithm model refers to a mathematical algorithm model automatically derived by performing the above-mentioned learning algorithm.

由於上述學習演算技術(包含基礎訓練演算法與擬合演算法)都是目前已相當成熟的演算技術,舉凡在所屬領域中具有通常知識者皆可利用以上所述之演算法或其組合來建構上述之判斷演算模型,以下不再予以贅述。Since the above-mentioned learning algorithm techniques (including basic training algorithms and fitting algorithms) are currently quite mature algorithm techniques, anyone with general knowledge in the relevant fields can use the above-mentioned algorithms or their combinations to construct the above-mentioned judgment algorithm model, and will not be elaborated on below.

判斷模組23可包含一判斷單元231與一標記單元232。判斷單元231係依據判斷演算模型與每一待判斷表單資料所對應之判斷用欄位資訊量與判斷用時間戳記量,判斷每一待判斷表單資料以對應產生用以預測每一待判斷表單資料JF為自動輸入或手動輸入之一判斷結果,據以產生複數個判斷結果。標記單元232係依據判斷結果對每一待判斷表單資料賦予一自動輸入標記A或一手動輸入標記M後,再儲存於資料儲存裝置1。The judgment module 23 may include a judgment unit 231 and a marking unit 232. The judgment unit 231 judges each form data to be judged according to the judgment calculation model and the judgment field information amount and judgment timestamp amount corresponding to each form data to be judged to generate a judgment result for predicting whether each form data to be judged JF is automatically input or manually input, thereby generating a plurality of judgment results. The marking unit 232 assigns an automatic input mark A or a manual input mark M to each form data to be judged according to the judgment result, and then stores it in the data storage device 1.

簡單而言,判斷演算模型就是用以在學習階段學習訓練如何依據學習用欄位資訊量與學習用時間戳記量推導出初始基準真相(可以利用自動輸入標記A或手動輸入標記M加以代表),以便於在判斷階段依據判斷用欄位資訊量與判斷用時間戳記量推導預測出判斷結果(同樣也可以利用自動輸入標記A或手動輸入標記M加以代表)。Simply put, the judgment algorithm model is used to learn and train how to derive the initial benchmark truth based on the learning field information and the learning timestamp in the learning stage (which can be represented by the automatic input label A or the manual input label M), so as to derive and predict the judgment result based on the judgment field information and the judgment timestamp in the judgment stage (which can also be represented by the automatic input label A or the manual input label M).

舉例而言,如表三所示,資料儲存裝置1也儲存了10個待判斷表單資料JF,對應之序號分別為1001~1010,因此會完成判斷後對應地產生10個判斷結果。其中序號為1001、1003與0005之待判斷表單資料之判斷結果為藉由手動輸入方式輸入的,因此,標記單元232會將序號為1001、1003與0005之待判斷表單資料賦予手動輸入標記M,其餘待判斷表單資料之判斷結果為藉由自動輸入的,標記單元232則會賦予自動輸入標記A。For example, as shown in Table 3, the data storage device 1 also stores 10 pending form data JF, and the corresponding serial numbers are 1001~1010, so 10 judgment results will be generated after the judgment is completed. The judgment results of the pending form data with serial numbers 1001, 1003 and 0005 are input by manual input, so the marking unit 232 will assign the manual input mark M to the pending form data with serial numbers 1001, 1003 and 0005, and the judgment results of the remaining pending form data are automatically input, and the marking unit 232 will assign the automatic input mark A.

表三:待判斷表單資料之判斷結果與標記 待判斷表單資料序號 判斷結果 標記 1001 手動輸入 M 1002 自動輸入 A 1003 手動輸入 M 1004 自動輸入 A 1005 手動輸入 M 1006 自動輸入 A 1007 自動輸入 A 1008 自動輸入 A 1009 自動輸入 A 1010 自動輸入 A Table 3: Judgment results and markings of the form data to be judged The serial number of the form data to be determined Judgment results Mark 1001 Manual input M 1002 Automatic input A 1003 Manual input M 1004 Automatic input A 1005 Manual input M 1006 Automatic input A 1007 Automatic input A 1008 Automatic input A 1009 Automatic input A 1010 Automatic input A

在將判斷結果儲存於資料儲存裝置1後,判斷裝置2可對資料輸入終端裝置3a~3c發出推播信號,請資料輸入終端裝置3a~3c之操作者驗證判斷結果是否正確。在資料輸入終端裝置3a~3c,在驗證出待判斷表單資料所對應之判斷結果錯誤時,利用回饋操作介面32a~32c對應地輸入回饋基準真相。此時,對應的待判斷表單資料會被定義為一判斷異常表單資料。在驗證出待判斷表單資料所對應之判斷結果正確時,可輸入判斷正確信息。回饋基準真相與判斷正確信息都會被傳送至判斷裝置2。After the judgment result is stored in the data storage device 1, the judgment device 2 can send a push signal to the data input terminal device 3a~3c, asking the operator of the data input terminal device 3a~3c to verify whether the judgment result is correct. In the data input terminal device 3a~3c, when it is verified that the judgment result corresponding to the form data to be judged is wrong, the feedback reference truth is inputted correspondingly using the feedback operation interface 32a~32c. At this time, the corresponding form data to be judged will be defined as a judgment abnormal form data. When it is verified that the judgment result corresponding to the form data to be judged is correct, the judgment correct information can be inputted. Both the feedback reference truth and the correct judgment information will be transmitted to the judgment device 2.

驗證示警模組24可包含一驗證單元241、一示警單元242、一驗證週期設定介面243與一判斷準確度計算單元244。驗證單元241在判斷裝置2收到回饋基準真相時,會將對應的待判斷表單資料會被定義為一判斷異常表單資料,並認定判斷結果與回饋基準真相不符。此時,驗證單元241會將回饋基準真相列為對應於判斷異常表單資料之追認基準真相,且示警單元242會發出一判斷異常提示信息。The verification alarm module 24 may include a verification unit 241, an alarm unit 242, a verification cycle setting interface 243, and a judgment accuracy calculation unit 244. When the judgment device 2 receives the feedback reference truth, the verification unit 241 will define the corresponding form data to be judged as a judgment abnormal form data, and determine that the judgment result does not match the feedback reference truth. At this time, the verification unit 241 will list the feedback reference truth as the confirmation reference truth corresponding to the judgment abnormal form data, and the alarm unit 242 will issue a judgment abnormality prompt message.

相反地,驗證單元241在判斷裝置2收到判斷正確信息時,會將對應的待判斷表單資料會被定義為一判斷正確表單資料,並認定判斷結果與回饋基準真相相符。此時,驗證單元241會直接將判斷結果列為對應於判斷正確表單資料之追認基準真相。接著,驗證單元241可將判斷異常表單資料與對應之追認基準真相傳送至儲存至資料儲存裝置1,以分別作為新增的基準真相表單資料與初始基準真相,藉以供監督式學習模組22重新進行學習演算而修改判斷演算模型。On the contrary, when the judgment device 2 receives the correct judgment information, the verification unit 241 will define the corresponding form data to be judged as a correct judgment form data, and determine that the judgment result is consistent with the feedback benchmark truth. At this time, the verification unit 241 will directly list the judgment result as the confirmed benchmark truth corresponding to the correct judgment form data. Then, the verification unit 241 can transmit the abnormal judgment form data and the corresponding confirmed benchmark truth to the data storage device 1 for storage, as the newly added benchmark truth form data and the initial benchmark truth, respectively, so as to provide the supervised learning module 22 with a new learning calculation to modify the judgment calculation model.

驗證週期設定介面243可進一步用以供設定一驗證週期,藉以依據驗證週期而週期性地將判斷異常表單資料與追認基準真相作為基準真相表單資料與初始基準真相,以供監督式學習模組依據該驗證週期而週期性地進行學習演算,藉以週期性地修改判斷演算模型。判斷準確度計算單元244則會統計帶判斷表單資JF中被定義為判斷異常表單資料與判斷正確表單資料的數量,藉以在計算出每一個驗證週期期間進行判斷之判斷準確度。驗證週期可依據需要獲悉判斷正確性之統計週期需求、表單資料數量的多寡或其他需求而定,可設定為每天驗證一次、每週驗證一次、每月驗證一次或每季驗證一次等。The verification cycle setting interface 243 can be further used to set a verification cycle, so that according to the verification cycle, the abnormal form data and the confirmed reference truth are periodically used as the reference truth form data and the initial reference truth, so that the supervised learning module can periodically perform learning calculations according to the verification cycle to periodically modify the judgment calculation model. The judgment accuracy calculation unit 244 will count the number of abnormal form data and correct form data defined in the judgment form data, so as to calculate the judgment accuracy of the judgment made during each verification cycle. The verification cycle can be determined based on the statistical cycle requirements for the accuracy of the judgment, the amount of form data, or other requirements. It can be set to verify once a day, once a week, once a month, once a quarter, etc.

舉例而言,承襲表三之判斷結果,如表四所示,在資料輸入終端裝置3a~3c之操作者驗證判斷結果後,發現序號為1003之待判斷表單資料之判斷結果為手動輸入,但實際上序號為1003之待判斷表單資料卻是自動輸入的,因此可利用回饋操作介面32a~32c中之一者對應地輸入應為「自動輸入」之回饋基準真相,此時,驗證單元241會將序號為1003之待判斷表單資料定義為判斷異常表單資料,並回饋基準真相(自動輸入)列為判斷異常表單資料(即序號為1003之待判斷表單資料)之追認基準真相,並儲存於資料儲存裝置1。示警單元242會發出一判斷異常提示信息,藉以提示利用目前所建立的數學演算模型來判斷序號為1003之待判斷表單資料時所產生之判斷結果是錯誤的。For example, according to the judgment result of Table 3, as shown in Table 4, after the operator of the data input terminal device 3a~3c verifies the judgment result, it is found that the judgment result of the form data to be judged with serial number 1003 is manually input, but in fact the form data to be judged with serial number 1003 is automatically input. Therefore, the feedback operation interface 32a~32c can be used to One corresponding input should be the feedback standard truth of "automatic input". At this time, the verification unit 241 will define the form data to be judged with the serial number 1003 as the form data to be judged abnormally, and feedback the standard truth (automatic input) as the confirmation standard truth of the form data to be judged abnormally (i.e. the form data to be judged with the serial number 1003), and store it in the data storage device 1. The warning unit 242 will issue a judgment abnormality prompt message to prompt that the judgment result generated when the form data to be judged with the serial number 1003 is judged by using the currently established mathematical calculation model is wrong.

相反地,在資料輸入終端裝置3a~3c之操作者驗證判斷結果後,發現其餘序號之待判斷表單資料之判斷結果皆為正確,則可藉由回饋操作介面32a~32c輸入判斷正確信息,此時,驗證單元241會直接把其餘序號之待判斷表單資料定義為判斷正確表單資料,並直接將所對應之判斷結果列為追認基準真相,也一併儲存於資料儲存裝置1。On the contrary, after the operator of the data input terminal device 3a~3c verifies the judgment result, if it is found that the judgment results of the remaining serial numbers of the form data to be judged are all correct, the correct judgment information can be input through the feedback operation interface 32a~32c. At this time, the verification unit 241 will directly define the remaining serial numbers of the form data to be judged as correct judgment form data, and directly list the corresponding judgment result as the verification benchmark truth, and also store it in the data storage device 1.

在每一個驗證週期期間,可累積多個已完成驗證之待判斷表單資料JF與對應的追認基準真相,並進入下個驗證週期時,將所累積之已完成驗證之部分或全部待判斷表單資料JF(如判斷異常表單資料,或者同時包含判斷異常表單資料與判斷正確表單資料)與對應之追認基準真相分別作為新增的基準真相表單資料與對應之初始基準真相。During each verification cycle, multiple verified pending form data JF and the corresponding confirmed baseline truths can be accumulated, and when entering the next verification cycle, part or all of the accumulated pending form data JF (such as abnormal form data, or both abnormal form data and correct form data) that have been verified and the corresponding confirmed baseline truths are used as newly added baseline truth form data and the corresponding initial baseline truth respectively.

判斷準確度計算單元244會統計出在10件待判斷表單資料中,只有1件(即序號為1003之待判斷表單資料)為判斷異常表單資料,其餘9件待判斷表單資料為判斷正確表單資料。據此,判斷準確度計算單元244可計算出在本次驗證週期期間,判斷系統100判斷待判斷表單資料JF為自動輸入或手動輸入之判斷準確度為90%。同時,判斷準確度計算單元244也可以順帶統計出在本次驗證週期期間的表單資料自動化率,如表四中之追認基準真相所呈現之結果,在本次驗證週期期間,共有8個自動輸入之表單資料, 2個手動輸入之表單資料,代表在本次驗證週期期間的表單資料自動化率為80%。The judgment accuracy calculation unit 244 will calculate that among the 10 pieces of form data to be judged, only 1 piece (i.e., the form data to be judged with serial number 1003) is judged as abnormal form data, and the remaining 9 pieces of form data to be judged are judged as correct form data. Based on this, the judgment accuracy calculation unit 244 can calculate that during this verification cycle, the judgment accuracy of the judgment system 100 in judging whether the form data to be judged JF is automatically input or manually input is 90%. At the same time, the judgment accuracy calculation unit 244 can also calculate the form data automation rate during this verification cycle. As shown in the results of the verification benchmark in Table 4, during this verification cycle, there are 8 automatically entered form data and 2 manually entered form data, which means that the form data automation rate during this verification cycle is 80%.

表四:對判斷結果進行驗證之結果 待判斷表單資料序號 判斷結果 回饋基準真相 追認基準真相 判斷正確性 1001 手動輸入 - 手動輸入 正確 1002 自動輸入 - 自動輸入 正確 1003 手動輸入 自動輸入 自動輸入 錯誤 1004 自動輸入 - 自動輸入 正確 1005 手動輸入 - 手動輸入 正確 1006 自動輸入 - 自動輸入 正確 1007 自動輸入 - 自動輸入 正確 1008 自動輸入 - 自動輸入 正確 1009 自動輸入 - 自動輸入 正確 1010 自動輸入 - 自動輸入 正確 Table 4: Verification results of the judgment results The serial number of the form data to be determined Judgment results Feedback Baseline Truth Identify the baseline truth Correctness of judgment 1001 Manual input - Manual input correct 1002 Automatic input - Automatic input correct 1003 Manual input Automatic input Automatic input Mistake 1004 Automatic input - Automatic input correct 1005 Manual input - Manual input correct 1006 Automatic input - Automatic input correct 1007 Automatic input - Automatic input correct 1008 Automatic input - Automatic input correct 1009 Automatic input - Automatic input correct 1010 Automatic input - Automatic input correct

在經過多個驗證週期的判斷、驗證與重新進行學習演算而修改判斷演算模型等步驟,判斷系統100判斷待判斷表單資料JF為自動輸入或手動輸入之判斷準確度可逐漸提升,直到判斷準確度提升到一目標準確度(如99.99%)以上,表示判斷系統100之判斷能力已達到一定程度的信心水準,此時可以延長驗證週期(譬如由每一季驗證一次延長為每一年驗證一次),甚至可以直接接受判斷結果,也就是把判斷系統100每一次進行判斷所得到之判斷結果都直接當成是基準真相而加以採信,不需要再進行後續的驗證。After multiple verification cycles of judgment, verification, and re-learning calculation to modify the judgment calculation model, the judgment accuracy of the judgment system 100 in judging whether the form data JF to be judged is automatically input or manually input can be gradually improved until the judgment accuracy is improved to a target accuracy (such as 99.99%) or above, indicating that the judgment ability of the judgment system 100 has reached a certain level of confidence. At this time, the verification cycle can be extended (for example, from once a quarter to once a year), and even the judgment result can be directly accepted, that is, the judgment result obtained by the judgment system 100 each time is directly regarded as the benchmark truth and is adopted without the need for subsequent verification.

進一步地,可將檢驗資源(包含人員、設備與/或工具軟體等)集中投注於對手動輸入資料產生的表單資料(也就是具有手動輸入標記M)的表單資料(特別是指經過驗證為手動輸入的基準真相表單資料GF)進行檢驗,並更正手動輸入資料產生的表單資料中的錯誤以提升表單資料之整體正確率。此外,也可以藉由增加對手動輸入之表單資料的抽檢率,與降低對自動輸入之表單資料的抽檢率的方式,在不增加檢驗的總工作負擔(總投注的檢驗資源)下,有效率地提升表單資料之整體正確率。Furthermore, inspection resources (including personnel, equipment and/or tool software, etc.) can be concentrated on inspecting form data generated by manually input data (i.e., form data with a manual input mark M) (especially the benchmark truth form data GF that has been verified to be manually input), and correcting errors in form data generated by manually input data to improve the overall accuracy of form data. In addition, the overall accuracy of form data can be efficiently improved without increasing the total workload of inspection (total inspection resources invested) by increasing the sampling rate of manually input form data and reducing the sampling rate of automatically input form data.

由於上述之特徵擷取模組21、監督式學習模組22、判斷模組23與驗證示警模組24都是再執行判斷程式JAP所產生的,因此,特徵擷取模組21、監督式學習模組22、判斷模組23與驗證示警模組24在本質上可以是判斷程式JAP之(部分)主程式、副程式或執行判斷程式JAP後所產生之程式頁面或功能介面。舉凡在所屬技術領域(特別是人工智慧演算法領域)中具有通常知識者,都可以依據以上學習與判斷邏輯,利用適當的程式語言來編寫具備上述之特徵擷取模組21、監督式學習模組22、判斷模組23與驗證示警模組24功能之判斷程式JAP(含其主程式或副程式),藉以實現本發明之上述種種技術。Since the above-mentioned feature capture module 21, supervised learning module 22, judgment module 23 and verification alarm module 24 are all generated by re-executing the judgment program JAP, the feature capture module 21, supervised learning module 22, judgment module 23 and verification alarm module 24 can essentially be (part of) the main program, sub-program of the judgment program JAP, or the program page or functional interface generated after executing the judgment program JAP. For example, anyone with general knowledge in the relevant technical field (especially the field of artificial intelligence algorithms) can use appropriate programming languages to write a judgment program JAP (including its main program or sub-program) having the functions of the above-mentioned feature capture module 21, supervised learning module 22, judgment module 23 and verification alarm module 24 according to the above-mentioned learning and judgment logic, so as to realize the above-mentioned various technologies of the present invention.

綜合以上所述,由於在本發明所提供之表單資料輸入方式判斷系統100中,係特別依據長期對表單資料為自動輸入或手動輸入之間的關聯性與規則性之觀察結果,特別選擇與時間與資料亂度相關之欄位資訊量與時間戳記量,作為後續進行學習訓練與判斷之重要特徵依據,據此進行監督式學習訓練而在短時間內建立較高信心水準之判斷演算模型與獲得較高判斷準確率之判斷結果。In summary, in the form data input method judgment system 100 provided by the present invention, based on the long-term observation results of the correlation and regularity between the form data automatically input or manually input, the field information amount and timestamp amount related to time and data chaos are specially selected as the important feature basis for subsequent learning training and judgment. Based on this, supervised learning training is carried out to establish a judgment calculation model with a higher confidence level and obtain a judgment result with a higher judgment accuracy in a short time.

進一步地,可藉由利用判斷裝置2週期性地進行判斷、驗證、示警與產生追認基礎真相的方式,修正錯誤的判斷結果,據以重新進行學習演算而修改判斷演算模型,藉此,不但可以達到自動判斷輸入方式之功效,更可以在較短的時間內大幅提升判斷準確度。在獲得判斷正確率更高的判斷結果後,更可進一步將檢驗資源(包含人員、設備與/或工具軟體等)集中投注於對手動輸入資料產生的表單資料進行檢驗,藉以進一步達到有效提升表單資料整體的正確率之功效。Furthermore, by using the judgment device to perform judgment, verification, warning and generate the basic truth in a two-cycle manner, the erroneous judgment results can be corrected, and the judgment calculation model can be modified by re-learning calculation. In this way, not only can the effect of automatic judgment input method be achieved, but also the judgment accuracy can be greatly improved in a shorter time. After obtaining the judgment result with a higher judgment accuracy rate, the inspection resources (including personnel, equipment and/or tool software, etc.) can be further concentrated on the inspection of the form data generated by manually input data, so as to further achieve the effect of effectively improving the overall accuracy of the form data.

藉由以上較佳具體實施例之詳述,係希望能更加清楚描述本發明之特徵與精神,而並非以上述所揭露的較佳具體實施例來對本發明之範疇加以限制。相反地,其目的是希望能涵蓋各種改變及具相等性的安排於本發明所欲申請之專利範圍的範疇內。The above detailed description of the preferred specific embodiments is intended to more clearly describe the features and spirit of the present invention, but is not intended to limit the scope of the present invention by the preferred specific embodiments disclosed above. On the contrary, the purpose is to cover various changes and arrangements with equivalents within the scope of the patent application for the present invention.

100:判斷系統 1:資料儲存裝置 2:判斷裝置 21:特徵擷取模組 211:欄位資訊量擷取單元 212:時間戳記量擷取單元 22:監督式學習模組 23:判斷模組 231:判斷單元 232:標記單元 24:驗證示警模組 241:驗證單元 242:示警單元 243:驗證週期設定介面 244:判斷準確度計算單元 3a~3c:資料輸入終端裝置 31a~31c:資料輸入介面 32a~32c:回饋操作介面 GF:基準真相表單資料 JF:待判斷表單資料 JAP:判斷程式 100: Judgment system 1: Data storage device 2: Judgment device 21: Feature acquisition module 211: Field information acquisition unit 212: Timestamp acquisition unit 22: Supervised learning module 23: Judgment module 231: Judgment unit 232: Marking unit 24: Verification alarm module 241: Verification unit 242: Alarm unit 243: Verification cycle setting interface 244: Judgment accuracy calculation unit 3a~3c: Data input terminal device 31a~31c: Data input interface 32a~32c: Feedback operation interface GF: Benchmark truth form data JF: Form data to be judged JAP: Judgment program

第一圖係顯示本發明較佳實施例所提供之表單資料輸入方式判斷系統之功能方塊示意圖。The first figure is a functional block diagram showing the form data input method determination system provided by the preferred embodiment of the present invention.

100:判斷系統 100: Judgment system

1:資料儲存裝置 1: Data storage device

2:判斷裝置 2: Judgment device

21:特徵擷取模組 21: Feature extraction module

211:欄位資訊量擷取單元 211: Field information extraction unit

212:時間戳記量擷取單元 212: Timestamp quantity capture unit

22:監督式學習模組 22:Supervised Learning Module

23:判斷模組 23: Judgment module

231:判斷單元 231: Judgment unit

232:標記單元 232: Marking unit

24:驗證示警模組 24: Verify the alarm module

241:驗證單元 241: Verification unit

242:示警單元 242: Alarm unit

243:驗證週期設定介面 243: Verification cycle setting interface

244:判斷準確度計算單元 244: Judgment accuracy calculation unit

3a~3c:資料輸入終端裝置 3a~3c: Data input terminal device

31a~31c:資料輸入介面 31a~31c: Data input interface

32a~32c:回饋操作介面 32a~32c: Feedback operation interface

GF:基準真相表單資料 GF: Baseline truth table data

JF:待判斷表單資料 JF: Form data to be determined

JAP:判斷程式 JAP: Judgment Program

Claims (10)

一種表單資料輸入方式判斷系統,包含: 一資料儲存裝置,係儲存有複數個基準真相表單資料與複數個待判斷表單資料,該些基準真相表單資料係對應地具有用以定義為自動輸入或手動輸入之複數個初始基準真相;以及 一判斷裝置,係通信連結於該資料儲存裝置以擷取該些基準真相表單資料與該些待判斷表單資料,並且在安裝與執行一判斷程式後產生: 一特徵擷取模組,係自每一該些基準真相表單資料中擷取反應資料亂度之一學習用欄位資訊量與一學習用時間戳記量,使該些基準真相表單資料具有對應之複數個上述之學習用欄位資訊量與複數個上述之學習用時間戳記量,並自每一該些待判斷表單資料中擷取反應資料亂度之一判斷用欄位資訊量與一判斷用時間戳記量; 一監督式學習模組,係依據該些基準真相表單資料所對應之該些學習用欄位資訊量與該些學習用時間戳記量與其中一該些初始基準真相,進行一學習演算而產生一判斷演算模型; 一判斷模組,係依據該判斷演算模型與每一該些待判斷表單資料所對應之該判斷用欄位資訊量與該判斷用時間戳記量,判斷每一該些待判斷表單資料以對應產生用以預測每一該些待判斷表單資料為自動輸入或手動輸入之一判斷結果,據以產生複數個上述之判斷結果;以及 一驗證示警模組,係接收用以定義該些待判斷表單資料為自動輸入或手動輸入之複數個回饋基準真相,並在驗證出其中一該些待判斷表單資料所對應之其中一該些判斷結果與其中一該些回饋基準真相不符時,發出一示警信息,據以定義出一判斷異常表單資料與一追認基準真相,並將該判斷異常表單資料與該追認基準真相儲存至該資料儲存裝置以作為該些基準真相表單資料中之一者與該些初始基準真相中之一者,藉以供該監督式學習模組重新進行該學習演算而修改該判斷演算模型。 A form data input mode judgment system includes: a data storage device storing a plurality of reference truth form data and a plurality of form data to be judged, wherein the reference truth form data correspondingly have a plurality of initial reference truths defined as automatic input or manual input; and a judgment device communicatingly connected to the data storage device to capture the reference truth form data and the form data to be judged, and generating after installing and executing a judgment program: A feature extraction module extracts a learning field information quantity and a learning time stamp quantity reflecting the data disorder from each of the reference truth form data, so that the reference truth form data has a corresponding plurality of the above-mentioned learning field information quantities and a plurality of the above-mentioned learning time stamp quantities, and extracts a judgment field information quantity and a judgment time stamp quantity reflecting the data disorder from each of the to-be-judged form data; A supervised learning module performs a learning calculation based on the learning field information and learning time stamp corresponding to the benchmark truth form data and one of the initial benchmark truths to generate a judgment calculation model; A judgment module judges each of the to-be-judged form data based on the judgment calculation model and the judgment field information and judgment time stamp corresponding to each of the to-be-judged form data to generate a judgment result for predicting whether each of the to-be-judged form data is automatically input or manually input, thereby generating a plurality of the above-mentioned judgment results; and A verification alarm module receives a plurality of feedback benchmark truths for defining the form data to be judged as automatically input or manually input, and issues a warning message when it is verified that one of the judgment results corresponding to one of the form data to be judged is inconsistent with one of the feedback benchmark truths, and defines a judgment abnormal form data and a confirmed benchmark truth accordingly, and stores the judgment abnormal form data and the confirmed benchmark truth in the data storage device as one of the benchmark truth form data and one of the initial benchmark truths, so as to allow the supervised learning module to re-perform the learning calculation and modify the judgment calculation model. 如請求項1所述之表單資料輸入方式判斷系統,更包含複數個資料輸入終端裝置,且該些基準真相表單資料與該些待判斷表單資料係由該些資料輸入終端裝置傳送至該資料儲存裝置加以儲存。The form data input method judgment system as described in claim 1 further includes a plurality of data input terminal devices, and the reference truth form data and the form data to be judged are transmitted from the data input terminal devices to the data storage device for storage. 如請求項2所述之表單資料輸入方式判斷系統,其中,每一該些資料輸入終端裝置更包含一回饋操作介面,以供每一該些資料輸入終端裝置之一操作者在驗證出其中一該些待判斷表單資料所對應之其中一該些判斷結果錯誤時,對應地輸入該些回饋基準真相中之一者。A form data input method judgment system as described in claim 2, wherein each of the data input terminal devices further includes a feedback operation interface for an operator of each of the data input terminal devices to input one of the feedback reference truths accordingly when verifying that one of the judgment results corresponding to one of the form data to be judged is erroneous. 如請求項3所述之表單資料輸入方式判斷系統,其中,該驗證示警模組更包含一驗證週期設定介面,以供設定一驗證週期,藉以依據該驗證週期而週期性地將該判斷異常表單資料與對應之該追認基準真相分別作為上述該些基準真相表單資料中之一者與對應之上述該些初始基準真相中之一者,以供該監督式學習模組依據該驗證週期而週期性地進行該學習演算。The form data input method judgment system as described in claim 3, wherein the verification alarm module further includes a verification cycle setting interface for setting a verification cycle, so as to periodically use the judged abnormal form data and the corresponding confirmed benchmark truth as one of the above-mentioned benchmark truth form data and one of the corresponding initial benchmark truths according to the verification cycle, so that the supervised learning module can periodically perform the learning calculation according to the verification cycle. 如請求項4所述之表單資料輸入方式判斷系統,其中,該驗證示警模組更包含一判斷準確度計算單元,用以統計在該驗證週期,已完成判斷之該些待判斷表單資料之數量,以及被定義為該判斷異常表單資料之數量,據以計算出一判斷準確度。The form data input method judgment system as described in claim 4, wherein the verification alarm module further includes a judgment accuracy calculation unit for counting the number of the form data to be judged that have completed the judgment during the verification cycle, and the number of the form data defined as abnormal, so as to calculate a judgment accuracy. 如請求項1所述之表單資料輸入方式判斷系統,其中,該資料儲存裝置為一資料儲存伺服器,且該判斷裝置係為一運算伺服器。A form data input method determination system as described in claim 1, wherein the data storage device is a data storage server and the determination device is a computing server. 如請求項1所述之表單資料輸入方式判斷系統,其中,該特徵擷取模組更包含一欄位資訊量擷取單元,且該欄位資訊量擷取單元係依據一欄位熵演算法加以運作而獲得該學習用欄位資訊量與該判斷用欄位資訊量,且該欄位熵演算法係為 ,其中 ,k表示資料欄位類型的數量,表示共包含k種資料欄位類型, 表示該k種資料欄位類型中的第i種資料欄位類型的數量,其中i、k與 皆為自然數。 The form data input method judgment system as described in claim 1, wherein the feature extraction module further includes a field information extraction unit, and the field information extraction unit is operated according to a field entropy algorithm to obtain the learning field information and the judgment field information, and the field entropy algorithm is ,in , k represents the number of data field types, indicating that there are k types of data field types in total. represents the number of the i-th data field type among the k data field types, where i, k and All are natural numbers. 如請求項1所述之表單資料輸入方式判斷系統,其中,在該特徵擷取模組更包含一時間戳記量擷取單元,且該時間戳記量擷取單元供一使用者在p個欄位中指定q個時間戳記欄位,並依據一時間戳記量演算法擷取該學習用時間戳記量與該判斷用時間戳記量,且該時間戳記量演算法係 ,其中 表示q個時間戳記欄位中第j個時間戳記欄位所對應的r列資料中所包含之相異資料內容種類數,其中j、 、p、q與r皆為自然數,且p>q。 The form data input method judgment system as described in claim 1, wherein the feature capture module further includes a timestamp quantity capture unit, and the timestamp quantity capture unit allows a user to specify q timestamp fields in p fields, and captures the learning timestamp quantity and the judgment timestamp quantity according to a timestamp quantity algorithm, and the timestamp quantity algorithm is ,in Indicates the number of different data content types contained in the r columns of data corresponding to the jth timestamp field among the q timestamp fields, where j, , p, q and r are all natural numbers, and p>q. 如請求項1所述之表單資料輸入方式判斷系統,其中,該判斷模組更包含一標記單元,且該標記單元係依據該判斷結果對每一該些待判斷表單資料賦予一自動輸入標記或一手動輸入標記後,再儲存於該資料儲存裝置。A form data input method judgment system as described in claim 1, wherein the judgment module further includes a marking unit, and the marking unit assigns an automatic input mark or a manual input mark to each of the form data to be judged based on the judgment result, and then stores it in the data storage device. 如請求項1所述之表單資料輸入方式判斷系統,其中,該監督式學習模組所進行之該學習演算包含至少一基礎演訓練算法與至少一擬合演算法,該至少一基礎訓練演算法包含K-最鄰近演算法(KNN)、支援向量機(SVM)演算法、決策樹(Decision Tree)演算法與回歸(Regression)演算法中之至少一者,且該至少一擬合演算法包含隨機森林(Random Forest)演算法與極限梯度提升(XGBoost)中之至少一者。A form data input method determination system as described in claim 1, wherein the learning algorithm performed by the supervised learning module includes at least one basic training algorithm and at least one fitting algorithm, the at least one basic training algorithm includes at least one of a K-nearest neighbor algorithm (KNN), a support vector machine (SVM) algorithm, a decision tree algorithm, and a regression algorithm, and the at least one fitting algorithm includes at least one of a random forest algorithm and an extreme gradient boosting (XGBoost).
TW112123637A 2023-06-26 2023-06-26 System for judging input mode of form data TWI845355B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW112123637A TWI845355B (en) 2023-06-26 2023-06-26 System for judging input mode of form data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW112123637A TWI845355B (en) 2023-06-26 2023-06-26 System for judging input mode of form data

Publications (2)

Publication Number Publication Date
TWI845355B true TWI845355B (en) 2024-06-11
TW202501334A TW202501334A (en) 2025-01-01

Family

ID=92541729

Family Applications (1)

Application Number Title Priority Date Filing Date
TW112123637A TWI845355B (en) 2023-06-26 2023-06-26 System for judging input mode of form data

Country Status (1)

Country Link
TW (1) TWI845355B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200421152A (en) * 2003-02-01 2004-10-16 Baxter Int Remote multi-purpose user interface for a healthcare system
TW200632801A (en) * 2004-10-08 2006-09-16 Univ Utah Res Found System for supervised remote training
WO2022133210A2 (en) * 2020-12-18 2022-06-23 Strong Force TX Portfolio 2018, LLC Market orchestration system for facilitating electronic marketplace transactions
CN116113967A (en) * 2020-07-16 2023-05-12 强力交易投资组合2018有限公司 System and method for controlling rights related to digital knowledge

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200421152A (en) * 2003-02-01 2004-10-16 Baxter Int Remote multi-purpose user interface for a healthcare system
TW200632801A (en) * 2004-10-08 2006-09-16 Univ Utah Res Found System for supervised remote training
CN116113967A (en) * 2020-07-16 2023-05-12 强力交易投资组合2018有限公司 System and method for controlling rights related to digital knowledge
WO2022133210A2 (en) * 2020-12-18 2022-06-23 Strong Force TX Portfolio 2018, LLC Market orchestration system for facilitating electronic marketplace transactions

Also Published As

Publication number Publication date
TW202501334A (en) 2025-01-01

Similar Documents

Publication Publication Date Title
CN111752833B (en) Software quality system approval method, device, server and storage medium
US12020046B1 (en) Systems and methods for automated process discovery
CN113592017B (en) Deep learning model standardized training method, management system and processing terminal
WO2020082673A1 (en) Invoice inspection method and apparatus, computing device and storage medium
CN112559817B (en) Report content verification method, system, computer equipment and storage medium
CN112632174B (en) A method, device and system for data verification
US11816112B1 (en) Systems and methods for automated process discovery
US10423916B1 (en) Method for generating developer performance ratings
CN113255525A (en) Mechanical water meter reading method and system
CN114168565B (en) Backtracking test method, device and system of business rule model and decision engine
CN111309586A (en) Command testing method, device and storage medium thereof
CN108897765A (en) A kind of batch data introduction method and its system
CN114066438A (en) Model-based monitoring data display method, device, equipment and storage medium
CN110287110A (en) The code detection method and device of application program
CN120892321B (en) Software defect processing method, electronic device, storage medium, and program product
TWI845355B (en) System for judging input mode of form data
CN114327377B (en) Method and device for generating demand tracking matrix, computer equipment and storage medium
JP2020057345A (en) Information processing device, learning device, information processing system, information processing method, and computer program
CN114841663A (en) Verification method, device, equipment and storage medium for installation quality of GPS equipment
JP2020052981A (en) Information processing device, learning device, information processing system, information processing method, and computer program
CN119416270A (en) A data verification method, device, equipment and readable storage medium
US12373639B2 (en) System for judging input mode of form data
CN115077906B (en) Method, device, electronic equipment and medium for determining the cause of frequent engine failures
CN112328951B (en) Processing method of experimental data of analysis sample
CN119442118B (en) Methods, devices, equipment and media for detecting anomalies in clearing data