[go: up one dir, main page]

TWI844473B - System and method for processing confidential data using online large language model - Google Patents

System and method for processing confidential data using online large language model Download PDF

Info

Publication number
TWI844473B
TWI844473B TW112134488A TW112134488A TWI844473B TW I844473 B TWI844473 B TW I844473B TW 112134488 A TW112134488 A TW 112134488A TW 112134488 A TW112134488 A TW 112134488A TW I844473 B TWI844473 B TW I844473B
Authority
TW
Taiwan
Prior art keywords
data
language model
large language
module
online
Prior art date
Application number
TW112134488A
Other languages
Chinese (zh)
Other versions
TW202511996A (en
Inventor
蔡佩芸
李佳珞
劉姿嘉
邱泊寰
Original Assignee
卡米爾股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 卡米爾股份有限公司 filed Critical 卡米爾股份有限公司
Priority to TW112134488A priority Critical patent/TWI844473B/en
Application granted granted Critical
Publication of TWI844473B publication Critical patent/TWI844473B/en
Publication of TW202511996A publication Critical patent/TW202511996A/en

Links

Landscapes

  • Storage Device Security (AREA)
  • Computer And Data Communications (AREA)

Abstract

一種利用線上大語言模型處理機密資料之系統及方法,系統中包括一資料隱私強化模組、一運算模組、一打包模組及一傳遞模組。資料隱私強化模組生成複數運算資料並提供給運算模組。運算模組包括一線上大語言模型,將運算資料輸入至線上大語言模型中進行分析或推理,以產生一程式碼或一指導原則。打包模組將程式碼或指導原則打包成一資料封包,資料封包的形式為一封閉內網中的一封閉大語言模型可運行的形式,再透過傳遞模組將資料封包傳送到封閉內網。本發明在保護機密資料的同時,利用線上大語言模型的強大分析和推理能力,提高封閉大語言模型的能力,並達到高品質的數據分析和推理效果。A system and method for processing confidential data using an online large language model, the system includes a data privacy enhancement module, a computing module, a packaging module and a transmission module. The data privacy enhancement module generates a plurality of computing data and provides them to the computing module. The computing module includes an online large language model, and the computing data is input into the online large language model for analysis or reasoning to generate a program code or a guiding principle. The packaging module packages the program code or guiding principle into a data package, and the data package is in a form that can be run by a closed large language model in a closed intranet, and then the data package is transmitted to the closed intranet through the transmission module. While protecting confidential data, the present invention utilizes the powerful analysis and reasoning capabilities of the online large language model to improve the capabilities of the closed large language model and achieve high-quality data analysis and reasoning effects.

Description

利用線上大語言模型處理機密資料之系統及方法System and method for processing confidential data using online large language model

本發明係有關一種資料處理方法,特別是指一種利用線上大語言模型處理機密資料之系統及方法。The present invention relates to a data processing method, and more particularly to a system and method for processing confidential data using an online large language model.

隨著大數據和人工智慧技術的發展,大語言模型如GPT-4等已經能夠進行高品質的數據分析和推理。然而,由於機密資料的保護需求,直接將機密資料送上線上大模型進行分析可能會洩漏機密資料。因此,如何在保護機密資料的同時,利用線上大語言模型的強大分析和推理能力,是當前技術發展的一個重要課題。With the development of big data and artificial intelligence technology, large language models such as GPT-4 are already capable of high-quality data analysis and reasoning. However, due to the need to protect confidential data, directly sending confidential data to online large models for analysis may leak confidential data. Therefore, how to protect confidential data while utilizing the powerful analysis and reasoning capabilities of online large language models is an important topic in current technological development.

有鑑於此,本發明針對上述習知技術之缺失及未來之需求,提出一種利用線上大語言模型處理機密資料之系統及方法,具體架構及其實施方式將詳述於下:In view of this, the present invention proposes a system and method for processing confidential data using an online large language model in order to address the above-mentioned deficiencies in the prior art and future needs. The specific architecture and implementation methods are described in detail below:

本發明之主要目的在提供一種利用線上大語言模型處理機密資料之系統及方法,其利用線上大語言模型對資料進行數據分析或推理,產生可供封閉大語言模型運用的程式碼或指導原則,讓封閉大語言模型可依照該程式碼或指導原則對機密資料進行分析和推理,以提高封閉大語言模型的能力。The main purpose of the present invention is to provide a system and method for processing confidential data using an online large language model, which uses the online large language model to perform data analysis or reasoning on the data, and generates program codes or guiding principles that can be used by a closed large language model, so that the closed large language model can analyze and reason on the confidential data according to the program codes or guiding principles, thereby improving the ability of the closed large language model.

本發明之另一目的在提供一種利用線上大語言模型處理機密資料之系統及方法,其可將機密資料中的敏感資訊去除再提供給線上大語言模型分析,或是產生模擬資料給線上大語言模型,在取得高品質的數據分析結果的同時還能避免機密資料中的敏感資訊外洩。Another purpose of the present invention is to provide a system and method for processing confidential data using an online big language model, which can remove sensitive information from confidential data and then provide it to the online big language model for analysis, or generate simulated data for the online big language model, thereby obtaining high-quality data analysis results while preventing the leakage of sensitive information in the confidential data.

為達上述目的,本發明提供一種利用線上大語言模型處理機密資料之系統,包括:一資料隱私強化模組,生成複數運算資料;一運算模組,連接資料隱私強化模組,運算模組包括一線上大語言模型,運算模組接收運算資料並將運算資料輸入線上大語言模型進行分析或推理,以產生一程式碼或一指導原則;一打包模組,連接運算模組,將程式碼或指導原則打包成一資料封包,資料封包的形式為一封閉內網中的一封閉大語言模型可運行的形式;以及一傳遞模組,連接打包模組,將資料封包傳送到封閉內網。To achieve the above-mentioned purpose, the present invention provides a system for processing confidential data using an online large language model, including: a data privacy enhancement module, generating a plurality of operation data; an operation module, connected to the data privacy enhancement module, the operation module including an online large language model, the operation module receiving the operation data and inputting the operation data into the online large language model for analysis or reasoning to generate a program code or a guiding principle; a packaging module, connected to the operation module, packaging the program code or guiding principle into a data package, the data package is in a form that can be run by a closed large language model in a closed intranet; and a transmission module, connected to the packaging module, transmitting the data package to the closed intranet.

依據本發明之實施例,資料隱私強化模組包括一去敏感資料處理模組,用以將複數客戶的機密資料中的敏感資訊去除,以做為運算資料。According to an embodiment of the present invention, the data privacy enhancement module includes a desensitization data processing module for removing sensitive information from confidential data of multiple customers to serve as computing data.

依據本發明之實施例,資料隱私強化模組包括一模擬資料生成模組,用以生成複數模擬客戶的模擬資料,並將模擬資料做為運算資料。According to an embodiment of the present invention, the data privacy enhancement module includes a simulation data generation module for generating simulation data of a plurality of simulated clients and using the simulation data as calculation data.

依據本發明之實施例,線上大語言模型所生成之程式碼係用以供封閉大語言模型進行數據分析。According to an embodiment of the present invention, the program code generated by the online large language model is used for the closed large language model to perform data analysis.

依據本發明之實施例,線上大語言模型所生成之指導原則為對各種文件進行題詞工程的設計。According to an embodiment of the present invention, the guiding principle generated by the online big language model is to design the title engineering for various documents.

依據本發明之實施例,打包模組係將程式碼或指導原則打包為docker容器形式、壓縮包形式、資料庫形式、或json或csv檔案格式。According to an embodiment of the present invention, the packaging module packages the program code or guiding principles into a docker container form, a compressed package form, a database form, or a json or csv file format.

依據本發明之實施例,傳遞模組配置為使用USB、硬碟、暫時性開啟的網路權限或加密後的單向網路進行資料封包之傳遞。According to an embodiment of the present invention, the transmission module is configured to use USB, hard disk, temporarily enabled network permission or encrypted one-way network to transmit data packets.

一種利用線上大語言模型處理機密資料之方法,包括下列步驟:生成複數運算資料;將運算資料輸入一線上大語言模型進行分析或推理,以產生一程式碼或一指導原則;將程式碼或指導原則打包成一資料封包,資料封包的形式為一封閉內網中一封閉大語言模型可運行的形式;以及將資料封包傳送到封閉內網。A method for processing confidential data using an online large language model includes the following steps: generating a plurality of computational data; inputting the computational data into an online large language model for analysis or reasoning to generate a program code or a guiding principle; packaging the program code or guiding principle into a data packet in a form that can be run by a closed large language model in a closed intranet; and transmitting the data packet to the closed intranet.

下面將結合本發明實施例中的附圖,對本發明實施例中的技術方案進行清楚、完整地描述,顯然,所描述的實施例是本發明一部分實施例,而不是全部的實施例。The technical solutions in the embodiments of the present invention will be described clearly and completely below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, but not all of the embodiments.

應當理解,當在本說明書和所附申請專利範圍中使用時,術語「包括」和「包含」指示所描述特徵、整體、步驟、操作、元素和/或元件的存在,但並不排除一個或多個其它特徵、整體、步驟、操作、元素、元件和/或其集合的存在或添加。It should be understood that when used in this specification and the appended patent applications, the terms "include" and "comprising" indicate the presence of described features, wholes, steps, operations, elements and/or components, but do not exclude the presence or addition of one or more other features, wholes, steps, operations, elements, components and/or combinations thereof.

還應當理解,在此本發明說明書中所使用的術語僅僅是出於描述特定實施例的目的而並不意在限制本發明。如在本發明說明書和所附申請專利範圍中所使用的那樣,除非上下文清楚地指明其它情況,否則單數形式的「一」、「一個」及「該」意在包括複數形式。It should also be understood that the terms used in this specification are for the purpose of describing specific embodiments only and are not intended to limit the present invention. As used in this specification and the appended patent applications, the singular forms "a", "an" and "the" are intended to include plural forms unless the context clearly indicates otherwise.

還應當進一步理解,在本發明說明書和所附申請專利範圍中使用的術語「及/或」是指相關聯列出的項中的一個或多個的任何組合以及所有可能組合,並且包括這些組合。It should be further understood that the term "and/or" used in this specification and the appended patent applications refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations.

本發明提供一種利用線上大語言模型處理機密資料之系統及方法,目的是利用線上大語言模型的強大分析和推理能力提高封閉大語言模型的能力。當有一筆資料需要進行處理,但資料中包含不能外洩的機密資訊,或是資料本身就是機密資料,其中包含敏感資訊時,這筆資料只能在內部網路利用封閉大語言模型進行數據分析和推理。然而,封閉大語言模型的分機和推理能力不夠強大,大約只有小學以下的程度,而線上大語言模型的分析能力約等於博士後研究生的程度,因此本發明利用線上大語言模型先分析及推理一組不包含機密資訊的運算資料,取得程式碼或指導原則(Standard Operation Procedure, SOP)後,再將程式碼或指導原則提供給封閉大語言模型使用。如此一來,便可借助線上大語言模型來提高封閉大語言模型的能力。本發明的系統架構及步驟流程詳述如後。The present invention provides a system and method for processing confidential data using an online large language model, the purpose of which is to use the powerful analysis and reasoning capabilities of the online large language model to improve the capabilities of the closed large language model. When there is a piece of data that needs to be processed, but the data contains confidential information that cannot be leaked, or the data itself is confidential data and contains sensitive information, this data can only be analyzed and reasoned using a closed large language model on the intranet. However, the analysis and reasoning capabilities of the closed large language model are not strong enough, and are only at the level of elementary school students, while the analysis capabilities of the online large language model are approximately equivalent to the level of a postdoctoral student. Therefore, the present invention uses the online large language model to first analyze and reason a set of computational data that does not contain confidential information, obtain the program code or guidelines (Standard Operation Procedure, SOP), and then provide the program code or guidelines to the closed large language model for use. In this way, the ability of the closed large language model can be improved by using the online large language model. The system architecture and step flow of the present invention are described in detail as follows.

請參考第1圖,其為本發明利用線上大語言模型處理機密資料之系統10之方塊圖。本發明之利用線上大語言模型處理機密資料之系統10包括一資料隱私強化模組12、一運算模組14、一打包模組16及一傳遞模組18。資料隱私強化模組12連接運算模組14,運算模組14連接打包模組16,打包模組16連接傳遞模組18,傳遞模組18則透過網際網路連接到一封閉內網20。Please refer to Figure 1, which is a block diagram of the system 10 for processing confidential data using an online large language model of the present invention. The system 10 for processing confidential data using an online large language model of the present invention includes a data privacy enhancement module 12, a computing module 14, a packaging module 16, and a transmission module 18. The data privacy enhancement module 12 is connected to the computing module 14, the computing module 14 is connected to the packaging module 16, the packaging module 16 is connected to the transmission module 18, and the transmission module 18 is connected to a closed intranet 20 via the Internet.

資料隱私強化模組12用以生成複數運算資料。進一步而言,資料隱私強化模組12包括一去敏感資料處理模組122及一模擬資料生成模組124。其中,去敏感資料處理模組122用以將複數客戶的機密資料中的敏感資訊去除,以做為運算資料。模擬資料生成模組124則用以生成複數模擬客戶的模擬資料,並將模擬資料做為運算資料。因此,本發明有兩種方式產生運算資料,一種是使用原有機密資料,但去除機密資料中的敏感資訊;一種是產生虛擬的數據做為運算資料。The data privacy enhancement module 12 is used to generate a plurality of computing data. Specifically, the data privacy enhancement module 12 includes a desensitization data processing module 122 and a simulation data generation module 124. The desensitization data processing module 122 is used to remove sensitive information from confidential data of a plurality of customers to use as computing data. The simulation data generation module 124 is used to generate simulation data of a plurality of simulated customers and use the simulation data as computing data. Therefore, the present invention has two ways to generate computing data, one is to use the original confidential data but remove the sensitive information in the confidential data; the other is to generate virtual data as computing data.

運算模組14包括一線上大語言模型142,此線上大語言模型142具有強大的分析及推理能力,例如ChatGPT第四代GPT-4。運算模組14接收運算資料,將運算資料輸入線上大語言模型142進行分析或推理,以產生一程式碼或一指導原則。程式碼可為虛擬碼(Pseudocode),用以供封閉大語言模型進行數據分析,而指導原則為對各種文件進行題詞工程的設計。換句話說,程式碼和指導原則都是類似步驟大綱的流程,用以讓封閉大語言模型可照著程式碼或指導原則的步驟依樣畫葫蘆去對其他的資料進行分析、推理。The computing module 14 includes an online large language model 142, which has powerful analysis and reasoning capabilities, such as ChatGPT's fourth generation GPT-4. The computing module 14 receives computing data and inputs the computing data into the online large language model 142 for analysis or reasoning to generate a program code or a guiding principle. The program code can be a pseudocode for a closed large language model to perform data analysis, and the guiding principle is the design of inscription engineering for various documents. In other words, the program code and the guiding principle are both processes similar to step outlines, which allow the closed large language model to follow the steps of the program code or guiding principle to analyze and reason on other data.

打包模組16用以將程式碼或指導原則打包成一資料封包,資料封包的形式為一封閉內網中的一封閉大語言模型可運行的形式。舉例而言,打包模組16可將程式碼或指導原則打包為docker容器形式、壓縮包形式、資料庫形式、或json或csv檔案格式。The packaging module 16 is used to package the program code or guidelines into a data package in a form that can be run by a closed language model in a closed intranet. For example, the packaging module 16 can package the program code or guidelines into a docker container form, a compressed package form, a database form, or a json or csv file format.

傳遞模組18則用以將打包模組16所打包的資料封包傳送到封閉內網20,以供封閉內網20中的封閉大語言模型202使用。在本發明的實施例中,傳遞模組18可為使用USB、硬碟、暫時性開啟的網路權限或加密後的單向網路進行資料封包之傳遞。The delivery module 18 is used to deliver the data packets packaged by the packaging module 16 to the closed intranet 20 for use by the closed large language model 202 in the closed intranet 20. In the embodiment of the present invention, the delivery module 18 can use USB, hard disk, temporarily opened network permissions or encrypted one-way network to deliver data packets.

請同時參考第2圖,其為本發明利用線上大語言模型處理機密資料之方法之流程圖。步驟S10中,資料隱私強化模組12生成複數運算資料。接著步驟S12中,將運算資料輸入運算模組14的線上大語言模型142,對運算資料進行分析或推理,以產生一程式碼或一指導原則。接著如步驟S14所述,打包模組16將程式碼或指導原則打包成一資料封包,資料封包的形式為一封閉內網中一封閉大語言模型可運行的形式。最後如步驟S16所述,傳遞模組18將資料封包傳送到封閉內網20,封閉大語言模型202依照程式碼或指導原則,對真實的機密資料進行分析及推理,如此一來,封閉大語言模型202便可借助線上大語言模型142的能力,得到高品質的數據分析和推理效果。Please also refer to FIG. 2, which is a flow chart of the method of processing confidential data using an online large language model of the present invention. In step S10, the data privacy enhancement module 12 generates a plurality of operation data. Then in step S12, the operation data is input into the online large language model 142 of the operation module 14, and the operation data is analyzed or inferred to generate a program code or a guiding principle. Then, as described in step S14, the packaging module 16 packages the program code or guiding principle into a data package in a form that can be run by a closed large language model in a closed intranet. Finally, as described in step S16, the transmission module 18 transmits the data packet to the closed intranet 20, and the closed large language model 202 analyzes and infers the real confidential data according to the code or guiding principles. In this way, the closed large language model 202 can obtain high-quality data analysis and inference effects with the help of the capabilities of the online large language model 142.

資料隱私強化模組12中的去敏感資料處理模組122是利用關鍵字去除敏感資訊。例如先設定多組敏感字詞庫,如身分證字號、電話等個人資料,或是銀行帳號、金額之類的隱私,便會設定為敏感字眼儲存在敏感字辭庫中,去敏感資料處理模組122會將機密資料中的敏感資訊刪除,或是以馬賽克或其它形式遮蔽。The desensitization data processing module 122 in the data privacy enhancement module 12 uses keywords to remove sensitive information. For example, multiple sets of sensitive word libraries are set first, such as personal information such as ID number, phone number, or privacy such as bank account number and amount, which will be set as sensitive words and stored in the sensitive word library. The desensitization data processing module 122 will delete the sensitive information in the confidential data, or mask it with mosaics or other forms.

資料隱私強化模組12中的模擬資料生成模組124則是由使用者輸入需求後,利用人工智能自動產生模擬資料。人工智能可為GPT-4,舉例而言,假設輸入需求為「請生成空氣污染的虛擬資料10列,包含device_id、lat、lon、pm2_5、time五個欄位」,則模擬資料生成模組124便可產生如第3圖所示之模擬資料126。The simulation data generation module 124 in the data privacy enhancement module 12 automatically generates simulation data using artificial intelligence after the user inputs the requirements. The artificial intelligence can be GPT-4. For example, assuming that the input requirement is "please generate 10 rows of virtual data of air pollution, including five fields of device_id, lat, lon, pm2_5, and time", the simulation data generation module 124 can generate simulation data 126 as shown in Figure 3.

去除敏感資訊的機密資料和模擬資料126用以做為運算資料,以供線上大語言模型142產生程式碼或指導原則。其中,指導原則為對各種文件進行題詞工程的設計。舉例而言,假設是有關於空氣污染數據分析的命題原則,輸入「請提供關於空氣污染數據分析的題詞模板,讓封閉大語言模型可以依據這些模板,做出比較精彩的數據分析命題的原理原則」,則線上大語言模型142可產生下表之指導原則: 1 空氣污染趨勢分析:請分析過去五年的空氣污染數據,試圖找出是否有任何顯著的趨勢或模式,例如,是否在某個季節、某個時間段或某種天氣條件下污染特別嚴重?你可以使用時間序列分析來解決這個問題。 2 空氣污染與人口健康的關聯性分析:利用公共健康數據(例如,呼吸道疾病的發病率),來探索空氣污染和人口健康的關係。這可能需要使用回歸分析或其他統計方法來確定變量之間的關係。 3 空氣污染源頭的識別:嘗試找出可能導致空氣污染的潛在源頭,例如工廠排放、車輛排放或農業活動。這可能需要進行因素分析或聚類分析。 4 空氣污染預測模型:使用歷史數據來建立一個預測模型,預測未來的空氣污染水平。這可以用到機器學習的預測模型,如決策樹、隨機森林或深度學習等。 5 空氣污染政策效果的評估:評估特定政策或介入措施如何影響空氣污染水平。這可能需要使用對照實驗或區域性分析來進行。 表一 The confidential data and simulation data 126 with sensitive information removed are used as computational data for the online big language model 142 to generate code or guiding principles. The guiding principles are the design of the title engineering for various documents. For example, assuming that there is a proposition principle related to air pollution data analysis, input "Please provide a title template for air pollution data analysis so that the closed big language model can make more exciting data analysis propositions based on these templates", then the online big language model 142 can generate the guiding principles in the following table: 1 Air pollution trend analysis: Analyze air pollution data from the past five years to find out if there are any significant trends or patterns. For example, is pollution particularly severe during a certain season, time of day, or weather conditions? You can use time series analysis to answer this question. 2 Analysis of the relationship between air pollution and population health: Use public health data (e.g., incidence of respiratory diseases) to explore the relationship between air pollution and population health. This may require the use of regression analysis or other statistical methods to determine the relationship between variables. 3 Identification of air pollution sources: Try to identify potential sources that may be causing air pollution, such as factory emissions, vehicle emissions, or agricultural activities. This may require factor analysis or cluster analysis. 4 Air pollution prediction model: Use historical data to build a prediction model to predict future air pollution levels. This can use machine learning prediction models such as decision trees, random forests, or deep learning. 5 Evaluation of the effectiveness of air pollution policies: Evaluating how a particular policy or intervention affects air pollution levels. This may require the use of controlled experiments or regional analyses. Table I

而程式碼則請參考第4圖,假設輸入第3圖的資料到線上大語言模型中,要求線上大語言模型「利用python可以計算平均值的 python,且此phthon之後要封裝變成docker放到封閉內網給封閉大語言模型來調用執行,盡可能用function方式封裝」,則線上大語言模型可產生如第4圖所示的程式碼128。而使用者還可更進一步對線上大語言模型提出要求,以期能得到最適合封閉大語言模型使用的程式碼,例如進一步要求線上大語言模型「請不要把模擬資料放在程式碼裡面,因為我是想要動態讀取.csv資料檔案」,則線上大語言模型可重寫程式碼,輸出如第5圖所示的程式碼128’。Please refer to Figure 4 for the code. Assuming that the data in Figure 3 is input into the online big language model, and the online big language model is required to "use python to calculate the average value, and this phphone should be packaged into docker and placed in the closed intranet for the closed big language model to call and execute, and package it in a function as much as possible", the online big language model can generate the code 128 shown in Figure 4. The user can also make further requirements to the online big language model in order to obtain the most suitable code for the closed big language model. For example, if the online big language model is further required to "please do not put the simulation data in the code, because I want to dynamically read the .csv data file", the online big language model can rewrite the code and output the code 128' shown in Figure 5.

上表一的指導原則和第5圖的程式碼128’皆可透過打包模組16封裝成可以傳送到封閉大語言模型202並直接執行的資料封包,例如第4圖要求的docker容器形式或第5圖要求的.csv檔案格式。最後再透過傳遞模組18傳送到封閉內網20的封閉大語言模型202中,直接執行資料封包便可運用程式碼128’或指導原則進行分析和推理。The guiding principles in Table 1 and the program code 128' in FIG. 5 can be packaged into a data package that can be transmitted to the closed large language model 202 and directly executed through the packaging module 16, such as the docker container format required by FIG. 4 or the .csv file format required by FIG. 5. Finally, it is transmitted to the closed large language model 202 in the closed intranet 20 through the delivery module 18, and the program code 128' or the guiding principles can be used for analysis and reasoning by directly executing the data package.

綜上所述,本發明提供一種利用線上大語言模型處理機密資料之系統及方法,利用線上大語言模型對資料進行數據分析或推理,產生可供封閉大語言模型運用的程式碼或指導原則,讓封閉大語言模型可依照該程式碼或指導原則對機密資料進行分析和推理,以提高封閉大語言模型的能力。因 此,雖然封閉大語言模型的能力較弱,但依循著線上大語言模型所提供的程式碼或指導原則去分析、推理數據,仍可得到高品質的推論結果,又可避免機密資料外洩,確保資訊安全。In summary, the present invention provides a system and method for processing confidential data using an online big language model, which uses the online big language model to perform data analysis or reasoning on the data, and generates a program code or guiding principle that can be used by the closed big language model, so that the closed big language model can analyze and reason on the confidential data according to the program code or guiding principle to improve the ability of the closed big language model. Therefore, although the ability of the closed big language model is relatively weak, by following the program code or guiding principle provided by the online big language model to analyze and reason data, high-quality inference results can still be obtained, and confidential data leakage can be avoided to ensure information security.

唯以上所述者,僅為本發明之較佳實施例而已,並非用來限定本發明實施之範圍。故即凡依本發明申請範圍所述之特徵及精神所為之均等變化或修飾,均應包括於本發明之申請專利範圍內。However, the above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Therefore, all equivalent changes or modifications based on the features and spirit described in the scope of the present invention should be included in the scope of the patent application of the present invention.

10:利用線上大語言模型處理機密資料之系統10: A system for processing confidential data using online large language models

12:資料隱私強化模組12: Data privacy enhancement module

122:去敏感資料處理模組122: Desensitizing data processing module

124:模擬資料生成模組124:Simulation data generation module

126:模擬資料126:Simulation data

128、128’:程式碼128, 128’: Program code

14:運算模組14: Computational Module

142:線上大語言模型142: Online Big Language Model

16:打包模組16: Packaging Module

18:傳遞模組18: Delivery module

20:封閉內網20: Closed Intranet

202:封閉大語言模型202: Closed Large Language Model

第1圖為本發明利用線上大語言模型處理機密資料之系統之方塊圖。 第2圖為本發明利用線上大語言模型處理機密資料之方法之流程圖。 第3圖為本發明之模擬資料生成模組產生之模擬資料之示意圖。 第4圖及第5圖為本發明利用線上大語言模型產生之程式碼之示意圖。 Figure 1 is a block diagram of the system of the present invention for processing confidential data using an online large language model. Figure 2 is a flow chart of the method of the present invention for processing confidential data using an online large language model. Figure 3 is a schematic diagram of the simulation data generated by the simulation data generation module of the present invention. Figures 4 and 5 are schematic diagrams of the program code generated by the present invention using an online large language model.

10:利用線上大語言模型處理機密資料之系統 10: A system for processing confidential data using an online large language model

12:資料隱私強化模組 12: Data privacy enhancement module

122:去敏感資料處理模組 122: Desensitizing data processing module

124:模擬資料生成模組 124:Simulation data generation module

14:運算模組 14: Computation module

142:線上大語言模型 142: Online Big Language Model

16:打包模組 16: Packaging module

18:傳遞模組 18: Delivery module

20:封閉內網 20: Close the intranet

202:封閉大語言模型 202: Closed Large Language Model

Claims (10)

一種利用線上大語言模型處理機密資料之系統,包括:一資料隱私強化模組,生成複數運算資料,其中,該資料隱私強化模組包括一模擬資料生成模組,用以生成複數模擬客戶的模擬資料,並將該等模擬資料做為該等運算資料;一運算模組,連接該資料隱私強化模組,該運算模組包括一線上大語言模型,該運算模組接收該等運算資料並將該等運算資料輸入該線上大語言模型進行分析或推理,以產生一程式碼或一指導原則;一打包模組,連接該運算模組,將該程式碼或該指導原則打包成一資料封包,該資料封包的形式為一封閉內網中的一封閉大語言模型可運行的形式;以及一傳遞模組,連接該打包模組,將該資料封包傳送到該封閉內網。 A system for processing confidential data using an online large language model includes: a data privacy enhancement module, which generates a plurality of operation data, wherein the data privacy enhancement module includes a simulation data generation module, which is used to generate simulation data of a plurality of simulated customers and use the simulation data as the operation data; an operation module, which is connected to the data privacy enhancement module, and the operation module includes an online large language model, and the operation module receives the and other computing data and inputting the computing data into the online big language model for analysis or reasoning to generate a program code or a guiding principle; a packaging module connected to the computing module to package the program code or the guiding principle into a data packet in a form that can be run by a closed big language model in a closed intranet; and a transmission module connected to the packaging module to transmit the data packet to the closed intranet. 如請求項1所述之利用線上大語言模型處理機密資料之系統,其中該資料隱私強化模組包括一去敏感資料處理模組,用以將複數客戶的機密資料中的敏感資訊去除,以做為該等運算資料。 A system for processing confidential data using an online large language model as described in claim 1, wherein the data privacy enhancement module includes a desensitization data processing module for removing sensitive information from confidential data of multiple customers to serve as the computational data. 如請求項1所述之利用線上大語言模型處理機密資料之系統,其中該線上大語言模型所生成之該程式碼係用以供該封閉大語言模型進行數據分析。 A system for processing confidential data using an online large language model as described in claim 1, wherein the program code generated by the online large language model is used for the closed large language model to perform data analysis. 如請求項1所述之利用線上大語言模型處理機密資料之系統,其中該線上大語言模型所生成之該指導原則為對各種文件進行題詞工程的設計。 A system for processing confidential data using an online large language model as described in claim 1, wherein the guiding principle generated by the online large language model is the design of title engineering for various documents. 如請求項1所述之利用線上大語言模型處理機密資料之系統,其中該打包模組係將該程式碼或該指導原則打包為docker容器形式、壓縮包形式、資料庫形式、或json或csv檔案格式。 A system for processing confidential data using an online large language model as described in claim 1, wherein the packaging module packages the program code or the guiding principle into a docker container form, a compressed package form, a database form, or a json or csv file format. 如請求項1所述之利用線上大語言模型處理機密資料之系統,其中該傳遞模組配置為使用USB、硬碟、暫時性開啟的網路權限或加密後的單向網路進行該資料封包之傳遞。 A system for processing confidential data using an online large language model as described in claim 1, wherein the transmission module is configured to use a USB, a hard drive, a temporarily enabled network permission, or an encrypted one-way network to transmit the data packet. 一種利用線上大語言模型處理機密資料之方法,包括下列步驟:生成複數模擬客戶的模擬資料,並將該等模擬資料配置為複數運算資料;將該等運算資料輸入一線上大語言模型進行分析或推理,以產生一程式碼或一指導原則;將該程式碼或該指導原則打包成一資料封包,該資料封包的形式為一封閉內網中一封閉大語言模型可運行的形式;以及將該資料封包傳送到該封閉內網。 A method for processing confidential data using an online large language model includes the following steps: generating simulation data of multiple simulated customers and configuring the simulation data into multiple computational data; inputting the computational data into an online large language model for analysis or reasoning to generate a program code or a guiding principle; packaging the program code or the guiding principle into a data packet in a form that can be run by a closed large language model in a closed intranet; and transmitting the data packet to the closed intranet. 如請求項7所述之利用線上大語言模型處理機密資料之方法,其中該線上大語言模型所生成之該程式碼係用以供該封閉大語言模型進行數據分析。 A method for processing confidential data using an online large language model as described in claim 7, wherein the program code generated by the online large language model is used for the closed large language model to perform data analysis. 如請求項7所述之利用線上大語言模型處理機密資料之方法,其中該線上大語言模型所生成之該指導原則為對各種文件進行題詞工程的設計。 A method for processing confidential data using an online large language model as described in claim 7, wherein the guiding principle generated by the online large language model is the design of title engineering for various documents. 如請求項7所述之利用線上大語言模型處理機密資料之方法,其中該程式碼或該指導原則打包為docker容器形式、壓縮包形式、資料庫形式、或json或csv檔案格式。A method for processing confidential data using an online large language model as described in claim 7, wherein the program code or the guiding principle is packaged in a docker container format, a compressed package format, a database format, or a json or csv file format.
TW112134488A 2023-09-11 2023-09-11 System and method for processing confidential data using online large language model TWI844473B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW112134488A TWI844473B (en) 2023-09-11 2023-09-11 System and method for processing confidential data using online large language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW112134488A TWI844473B (en) 2023-09-11 2023-09-11 System and method for processing confidential data using online large language model

Publications (2)

Publication Number Publication Date
TWI844473B true TWI844473B (en) 2024-06-01
TW202511996A TW202511996A (en) 2025-03-16

Family

ID=92541542

Family Applications (1)

Application Number Title Priority Date Filing Date
TW112134488A TWI844473B (en) 2023-09-11 2023-09-11 System and method for processing confidential data using online large language model

Country Status (1)

Country Link
TW (1) TWI844473B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201931164A (en) * 2017-12-29 2019-08-01 大陸商中國銀聯股份有限公司 Text quality index obtaining method and apparatus
CN116436704A (en) * 2023-06-13 2023-07-14 深存科技(无锡)有限公司 Data processing method and data processing device for user privacy data
US20230274094A1 (en) * 2021-08-24 2023-08-31 Unlikely Artificial Intelligence Limited Computer implemented methods for the automated analysis or use of data, including use of a large language model
CN116680093A (en) * 2023-05-20 2023-09-01 数字郑州科技有限公司 LLM-based web application optimization system and service implementation method and system
CN116680573A (en) * 2023-06-30 2023-09-01 重庆珏盛教育科技有限公司 Language model construction training method for generating self-introduction through limited identity information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201931164A (en) * 2017-12-29 2019-08-01 大陸商中國銀聯股份有限公司 Text quality index obtaining method and apparatus
US20230274094A1 (en) * 2021-08-24 2023-08-31 Unlikely Artificial Intelligence Limited Computer implemented methods for the automated analysis or use of data, including use of a large language model
CN116680093A (en) * 2023-05-20 2023-09-01 数字郑州科技有限公司 LLM-based web application optimization system and service implementation method and system
CN116436704A (en) * 2023-06-13 2023-07-14 深存科技(无锡)有限公司 Data processing method and data processing device for user privacy data
CN116680573A (en) * 2023-06-30 2023-09-01 重庆珏盛教育科技有限公司 Language model construction training method for generating self-introduction through limited identity information

Also Published As

Publication number Publication date
TW202511996A (en) 2025-03-16

Similar Documents

Publication Publication Date Title
Chen et al. Sql injection attack detection and prevention techniques using deep learning
Wang et al. A survey on ChatGPT: AI–generated contents, challenges, and solutions
Paulus et al. Advprompter: Fast adaptive adversarial prompting for llms
Kobayashi et al. Weight decay induces low-rank attention layers
Lyle et al. Normalization and effective learning rates in reinforcement learning
Abroshan et al. A phishing mitigation solution using human behaviour and emotions that influence the success of phishing attacks
TW202331564A (en) Data poisoning method and data poisoning apparatus
Mmaduekwe Bias and fairness issues in artificial intelligence-driven cybersecurity
TWI844473B (en) System and method for processing confidential data using online large language model
Sood Combating cyberattacks targeting the AI ecosystem: Assessing threats, risks, and vulnerabilities
Singh et al. Bias-aware agent: enhancing fairness in AI-driven knowledge retrieval
Bhardwaj et al. Conversational AI: Introduction to Chatbot's Security Risks, Their Probable Solutions, and the Best Practices to Follow
Yu et al. Error correction output codes for robust neural networks against weight-errors: A neural tangent kernel point of view
Barlag et al. Graph neural networks and arithmetic circuits
CN120602194B (en) A generative adversarial-driven intelligent security defense method and system
Pesati Security considerations for large language model use: Implementation research in securing llm-integrated applications
Rana et al. Generative AI Security: Defense, Threats, and Vulnerabilities
Veerasamy et al. Unpacking AI Security Considerations
Kalavasis et al. Injecting undetectable backdoors in obfuscated neural networks and language models
Joshi Gen AI in Financial Cybersecurity: A Comprehensive Review of Architectures, Algorithms, and Regulatory Challenges
Badhe Scamagents: How ai agents can simulate human-level scam calls
CN119202183A (en) A data flow method for connecting multiple RAG systems
Thomas Enhancing Scalability and Transparency in AI-Driven Credit Scoring: Optimizing Explainability for Large-Scale Financial Systems
Nafea et al. Ethical Implications of Language Models
Park Development of chatbot using knowledge graph and ai agent