[go: up one dir, main page]

CN119203244A - A governance method for large model data leakage - Google Patents

A governance method for large model data leakage Download PDF

Info

Publication number
CN119203244A
CN119203244A CN202411686760.2A CN202411686760A CN119203244A CN 119203244 A CN119203244 A CN 119203244A CN 202411686760 A CN202411686760 A CN 202411686760A CN 119203244 A CN119203244 A CN 119203244A
Authority
CN
China
Prior art keywords
characteristic
port
model training
data
training terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411686760.2A
Other languages
Chinese (zh)
Other versions
CN119203244B (en
Inventor
孙伟
赵正伟
周红艳
刘莹莹
张垚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhonghe Yunke Information Technology Group Co ltd
Original Assignee
Zhonghe Yunke Information Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhonghe Yunke Information Technology Group Co ltd filed Critical Zhonghe Yunke Information Technology Group Co ltd
Priority to CN202411686760.2A priority Critical patent/CN119203244B/en
Publication of CN119203244A publication Critical patent/CN119203244A/en
Application granted granted Critical
Publication of CN119203244B publication Critical patent/CN119203244B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the field of data management, in particular to a large model data leakage management method, which is characterized in that a characteristic behavior database aiming at a model training terminal is established based on instruction response data screening rule characteristics, when the model training terminal is accessed, a behavior fitting characterization coefficient corresponding to the model training terminal is determined according to comparison of instruction response data acquired in real time and data in the characteristic behavior database, the characteristic fitting type of the current model training terminal is identified, interaction of the model training terminal is controlled in time, and under the weak characteristic fitting type, the data transmission rule of a normal characteristic port is verified, and whether the corresponding normal characteristic port is closed or not is judged.

Description

Large model data leakage treatment method
Technical Field
The invention relates to the field of data management, in particular to a large-model data leakage management method.
Background
With the development of computer technology and internet technology, data models under different architectures are widely applied in various fields, access terminals are widely used in the application process, and various terminals, especially terminals for model training, may contain a large amount of sensitive data, so that terminal security is also important, and technologies related to terminal security and data leakage prevention are gradually paid attention to.
For example, CN114676456A discloses a data privacy protection method, device and storage medium based on edge calculation, and relates to the technical field of information security, wherein the method comprises the steps of acquiring a plurality of training sub-models in a federal machine learning mechanism, wherein each training sub-model is obtained by training an initial model by different end users according to a plurality of original data, and the plurality of original data are acquired different end user data; and according to the model parameters, learning and training the target training sub-model in target edge equipment to obtain a target training model, and detecting whether the data leakage risk exists in each original data according to the target training model. By the application, the problem that the risk of data leakage in the related technology is high is solved.
However, in the prior art, when different terminals are accessed by authorized visitors, due to the habit difference, certain regularity exists in the data dimension of the data interaction of the terminals and the instructions of the corresponding instruction input devices, in the prior art, the regularity is not considered to identify abnormal access in time, so that the security of the terminals is not high, and the data is easy to leak.
Disclosure of Invention
In order to solve the problems that when different terminals are accessed by authorized visitors, due to habit differences, data interaction of the terminals when accessed and instructions of corresponding instruction input equipment always have certain regularity in data dimension, in the prior art, the problem that the safety of the terminals is not high and data is easy to leak due to abnormal access is not recognized in time according to the regularity is solved, the invention provides a treatment method for large-model data leakage, which comprises the following steps:
Acquiring instruction response data in a history period of a model training terminal, wherein the instruction response data comprises data interaction amounts of the model training terminal and each port and instruction triggering targets corresponding to instruction input equipment;
Screening rule features based on the instruction response data to establish a feature behavior database aiming at the model training terminal, wherein the rule features comprise feature port combinations and feature function component combinations;
responding to the accessed model training terminal, and comparing instruction response data of the model training terminal in the acquisition period with data in a characteristic behavior database;
determining a behavior laminating characterization coefficient corresponding to the model training terminal according to the comparison result so as to identify the characteristic laminating category of the current model training terminal;
The feature fit class control model based on current access behavior trains data interactions of the terminal, including,
Invoking the characteristic behavior database to identify a normal characteristic port, verifying a data transmission rule of the normal characteristic port, judging whether to close the corresponding normal characteristic port, and synchronously sending verification requirement information to the model training terminal so as to judge whether to send an early warning signal according to feedback information of the model training terminal;
or continuously collecting interactive data of the model training terminal in a period and judging the characteristic fitting type of the model training terminal according to instruction data of the corresponding instruction input equipment.
Further, the process of screening feature port combinations based on the instruction response data includes,
Determining a first conditional probability of data interaction between the model training terminal and different port combinations in a plurality of accessed processes;
Determining a characteristic port combination based on a first conditional probability corresponding to each port combination, recording a data interaction quantity average value of each port data interaction in the characteristic port combination, and storing the data interaction quantity average value in a characteristic behavior database;
And if the first conditional probability corresponding to the port combination is greater than a predetermined port combination conditional probability threshold, judging the port combination as a characteristic port combination.
Further, the process of screening feature function component combinations based on the instruction response data includes
Acquiring a second conditional probability that each functional component combination is triggered in the process of accessing the model training terminal for a plurality of times;
Determining a characteristic functional component combination based on a second conditional probability corresponding to the functional component combination, and recording the triggering frequency of each functional component in the characteristic functional component combination;
And if the second conditional probability corresponding to the function component combination is greater than the predetermined function component combination probability threshold, determining that the current function component combination is the characteristic function component combination.
Further, the process of comparing the instruction response data of the model training terminal in the acquisition period with the data in the characteristic behavior database comprises,
Recording ports generating data interaction currently, if the current port combination exists in the characteristic behavior database, determining that the port combination is matched, and recording the current data interaction quantity average value corresponding to each port;
respectively calculating the difference between the current data interaction quantity average value corresponding to each port and the data interaction quantity average value stored in the characteristic behavior database, and obtaining the data interaction quantity matching characteristic after averaging the obtained difference values;
Recording the current triggered functional components, if the current functional component combination exists in the characteristic behavior database, determining that the functional component combination is matched, and recording the current triggering frequency of each functional component;
And calculating the difference between the current trigger frequency of each functional component and the trigger frequency stored in the characteristic behavior database, and obtaining the trigger frequency matching characteristic after averaging the obtained difference values.
Further, the process of determining the behavior fit characterization coefficient corresponding to the model training terminal according to the comparison result comprises,
Calculating the ratio of a preset reference data interaction quantity matching threshold value to a data interaction quantity matching characteristic to obtain a first fitting characterization factor;
Calculating the ratio of a preset reference trigger frequency matching threshold value to a trigger frequency matching characteristic to obtain a second fit characterization factor;
And determining the sum of the first lamination characterization factor and the second lamination characterization factor as a lamination characterization factor.
Further, the process of identifying the feature fit category of the current model training terminal comprises,
If the fit characteristic coefficient is larger than or equal to a preset fit characteristic coefficient threshold value, judging that the current model training terminal is of a strong characteristic fit type;
and if the fit characteristic coefficient is smaller than a preset fit characteristic coefficient threshold, judging that the current model training terminal is of a weak characteristic fit type.
Further, the data interaction of the model training terminal based on the characteristic fitting category control of the current access behavior comprises,
If the current model training terminal is in a weak feature fitting type, calling the feature behavior database to identify a normal feature port, verifying a data transmission rule of the normal feature port, judging whether to close the corresponding normal feature port, and synchronously sending verification requirement information to the model training terminal so as to judge whether to send an early warning signal according to feedback information of the model training terminal;
And if the current model training terminal is of a strong characteristic fitting type, continuously collecting interaction data of the model training terminal in a period and instruction data corresponding to the instruction input equipment to judge the characteristic fitting type of the model training terminal.
Further, the process of calling the characteristic behavior database to identify the normal characteristic port comprises,
Arranging each characteristic port combination according to the corresponding first conditional probability descending order to obtain a characteristic port combination sequence;
screening a characteristic port combination sequence number with a preset proportion by the head end of the characteristic port combination sequence;
marking each characteristic port combination based on each characteristic port combination serial number;
identifying ports contained in the marked feature port combinations as normal feature ports;
Each characteristic port combination corresponds to a unique serial number, and the characteristic port combination sequence consists of each unique serial number.
Further, verifying the data transmission rule of the normal feature port, determining whether to close the corresponding normal feature port includes,
Acquiring the current data interaction quantity of each normal characteristic port, respectively constructing a data interaction quantity time domain curve, and determining the normal data interaction characteristics;
Calling a characteristic behavior database to acquire data interaction amounts corresponding to the normal characteristic ports, constructing a data interaction amount sample time domain curve, and determining sample data interaction characteristics;
comparing the normal data interaction characteristics corresponding to the normal characteristic ports with the sample data interaction characteristics to determine characteristic difference quantity;
if the characteristic difference is greater than or equal to a preset characteristic difference threshold, judging that the normal characteristic port is closed;
The constant data interaction characteristic and the sample data interaction characteristic comprise data interaction quantity average amplitude, rising section average slope and falling section average slope.
Further, the feedback information comprises a key, and if the key is not matched with a preset key, the generation of the early warning signal is judged.
Compared with the prior art, the method and the device have the advantages that the characteristic behavior database aiming at the model training terminal is established based on the rule characteristics of the instruction response data, when the model training terminal is accessed, the behavior fitting characterization coefficients corresponding to the model training terminal are determined according to the comparison of the instruction response data acquired in real time and the data in the characteristic behavior database, the characteristic fitting type of the current model training terminal is identified, the interaction of the model training terminal is controlled in time, the data transmission rule of the normal characteristic port is verified under the weak characteristic fitting type, and whether the corresponding normal characteristic port is closed or not is judged.
In particular, the method and the device establish the characteristic behavior database aiming at the model training terminal based on the rule characteristics of the instruction response data, in the actual situation, due to the operation habit of an authorized operator, when the model training terminal is accessed for a long time, certain regularity exists in the instructions of the corresponding instruction input equipment, for example, partial functional components of the model training terminal are intensively triggered, and partial functional components are freshly triggered, so that the model training terminal also shows regularity in data interaction with each port, for example, data interaction with a specific port is generated, and the data interaction quantity is in a certain range, therefore, the characteristic port combination and the characteristic functional component combination are considered as rule characteristics, the special characteristic behavior database is constructed, data support is provided for the follow-up identification characteristic attaching category, the data interaction of the model training terminal is controlled adaptively, the safety is ensured, and the risk of data leakage is reduced.
In particular, the invention calculates the behavior laminating characterization coefficient, builds the characteristic laminating category of the model training terminal, and the laminating characterization coefficient characterizes the difference between the regularity reflected by the corresponding instruction response data when the model training terminal is accessed and the regularity reflected by the corresponding instruction response data in a normal state, so that whether the current model terminal is in a weak characteristic laminating category can be timely divided, the behavior of a camouflage authorized visitor for accessing the model training terminal can be identified, the subsequent intervention can be timely made, the data interaction of the model training terminal can be adaptively controlled, and the safety can be ensured.
In particular, the invention identifies the normal characteristic port, in the actual situation, the normal characteristic port is determined based on the characteristic behavior database, the port combination with higher data interaction frequency is represented in the normal state, and the invasion risk is higher, furthermore, the invention judges whether to close the normal characteristic port in time based on the transmission rule of the normal characteristic port, the data interaction quantity and the change condition of the data interaction quantity of the normal characteristic port are considered in the verification of the transmission rule, in the actual situation, if a camouflage operator accesses a model training terminal to steal data, interfere data interaction or cause pollution by calling a data input model, the invention can embody the transmission rule, therefore, the invention identifies the situation, closes the transmission port in time, ensures the safety of the model training terminal, and reduces the risk of data leakage.
Drawings
FIG. 1 is a schematic diagram of the steps of a method for managing large model data leakage according to an embodiment of the invention;
FIG. 2 is a logic decision diagram for identifying feature fit categories of a current model training terminal according to an embodiment of the invention;
FIG. 3 is a logic decision diagram of an embodiment of the present invention for determining whether to close a corresponding normal feature port;
FIG. 4 is a logic diagram of an embodiment of the present invention for determining whether to issue an early warning signal.
Detailed Description
The invention will be further described with reference to examples for the purpose of making the objects and advantages of the invention more apparent, it being understood that the specific examples described herein are given by way of illustration only and are not intended to be limiting.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
Referring to fig. 1 to 4, fig. 1 is a schematic diagram illustrating steps of a method for managing large model data leakage according to an embodiment of the present invention, fig. 2 is a logic determination chart for identifying a feature fitting type of a training terminal of a current model according to an embodiment of the present invention, fig. 3 is a logic determination chart for determining whether to close a corresponding normal feature port according to an embodiment of the present invention, and fig. 4 is a logic determination chart for determining whether to send an early warning signal according to an embodiment of the present invention, where the method for managing large model data leakage according to the present embodiment includes:
S1, acquiring instruction response data in a history period of a model training terminal, wherein the instruction response data comprises data interaction amounts of the model training terminal and each port and instruction triggering targets corresponding to instruction input equipment;
s2, screening regular features based on the instruction response data to establish a feature behavior database aiming at the model training terminal, wherein the regular features comprise feature port combinations and feature function component combinations;
S3, responding to the mode training terminal being accessed, and acquiring instruction response data of the mode training terminal in a period to compare with data in a characteristic behavior database;
S4, determining a behavior laminating characterization coefficient corresponding to the model training terminal according to the comparison result so as to identify the characteristic laminating category of the current model training terminal;
s5, controlling data interaction of the model training terminal based on the characteristic fitting category of the current access behavior, comprising,
Invoking the characteristic behavior database to identify a normal characteristic port, verifying a data transmission rule of the normal characteristic port, judging whether to close the corresponding normal characteristic port, and synchronously sending verification requirement information to the model training terminal so as to judge whether to send an early warning signal according to feedback information of the model training terminal;
or continuously collecting interactive data of the model training terminal in a period and judging the characteristic fitting type of the model training terminal according to instruction data of the corresponding instruction input equipment.
Specifically, the specific structure of the model training terminal is not limited, and the model training terminal is a logic component capable of executing logic operation, and only needs to realize the function of loading or storing the model for training, for example, a computer, which is not described again.
Specifically, in implementation, the instruction input device is connected with the model training terminal in a matching manner, so as to input an instruction to the model training terminal to realize a corresponding function, such as a keyboard, a mouse, and the like.
Specifically, for the functional component, the functional component is configured in the model training terminal to implement a corresponding function, for example, a certain software in the model training terminal, and generally, the functional component can be triggered by the instruction input device through a control instruction to implement the corresponding function, which is not described herein.
Specifically, the model training terminal is accessed including a visitor accessing the model training terminal, the model training terminal being controlled by the instruction input device.
Specifically, in a computer network, ports refer to logical communication endpoints that are used to distinguish between different network services or processes running on the terminal, each port being identified by a unique port number, which is a 16-bit number ranging from 0 to 65535, and will not be described in detail.
Specifically, in step S2, the process of screening feature port combinations based on the instruction response data includes,
Determining a first conditional probability of data interaction between the model training terminal and different port combinations in a plurality of accessed processes, wherein it can be understood that single access comprises the step of starting to access the model training terminal to stop to access the model training terminal by a visitor;
it can be appreciated that the model training terminal may interact with different ports during a single access due to different control instructions,
Taking a total of 5 ports as an example, the port serial numbers are 1, 2, 3, 4 and 5 respectively;
If the current port combination is 1,2 and 3, the model training terminal is accessed 10 times, wherein the model training terminal is accessed 3 times under the current port combination, the first conditional probability corresponding to the port combination is 3/10, namely the ratio of the number of times the model training terminal is accessed under the current port combination to the total number of times the model training terminal is accessed;
Determining a characteristic port combination based on a first conditional probability corresponding to each port combination, recording a data interaction quantity average value of each port data interaction in the characteristic port combination, and storing the data interaction quantity average value in a characteristic behavior database;
And if the first conditional probability corresponding to the port combination is greater than a predetermined port combination conditional probability threshold, judging the port combination as a characteristic port combination.
Specifically, the port combination conditional probability threshold is determined based on the first conditional probability average value of each port combination, and is set to be the product of the conditional probability average value and the precision coefficient, and the precision coefficient is selected within the interval [1.05,1.15 ].
Specifically, in step S2, the process of screening feature function component combinations based on the instruction response data includes
Acquiring a second conditional probability that each functional component combination is triggered in the process of accessing the model training terminal for a plurality of times;
It can be understood that the second conditional probability is the ratio of the number of times the model training terminal is accessed to the total number of times the model training terminal is accessed under the combination of the functional components;
And determining a characteristic functional component combination based on a second conditional probability corresponding to the functional component combination, and recording the triggering frequency of each functional component in the characteristic functional component combination, wherein the triggering frequency is the number of times triggered in a reference time, and the reference time can be selected from 3min to 10min in order to enable the acquired instruction response data to have data characterizability.
And if the second conditional probability corresponding to the function component combination is greater than the predetermined function component combination probability threshold, determining that the current function component combination is the characteristic function component combination.
Specifically, in implementation, the functional component combination probability threshold is determined based on a second conditional probability average that each functional component combination is triggered, and is set to be a product of the second conditional probability average and the precision coefficient.
According to the method, the characteristic behavior database aiming at the model training terminal is established based on the rule characteristics of the instruction response data, in the practical situation, due to the operation habit of an authorized operator, when the model training terminal is accessed for a long time, certain regularity exists in the instructions of the corresponding instruction input equipment, for example, partial functional components of partial model training terminals are intensively triggered, and partial functional components are freshly triggered, so that the model training terminal also shows regularity in data interaction with each port, for example, data interaction with a specific port is generated, and the data interaction amount is in a certain range, therefore, the characteristic port combination and the characteristic functional component combination are considered as rule characteristics, the special characteristic behavior database is constructed, data support is provided for the follow-up identification characteristic attaching category, the data interaction of the model training terminal is controlled in an adaptive mode, the safety is ensured, and the risk of data leakage is reduced.
Specifically, in step S3, the process of comparing the instruction response data of the model training terminal in the acquisition period with the data in the characteristic behavior database comprises,
Recording a current port generating data interaction, if the current port combination exists in the characteristic behavior database, determining port combination matching, and recording a current data interaction quantity average value corresponding to each port, wherein it can be understood that the data interaction quantity of each port in each time period in a certain time can be synchronously recorded, and then the data interaction quantity average value is solved;
respectively calculating the difference between the current data interaction quantity average value corresponding to each port and the data interaction quantity average value stored in the characteristic behavior database, and obtaining the data interaction quantity matching characteristic after averaging the obtained difference values;
Recording the current triggered functional components, if the current functional component combination exists in the characteristic behavior database, determining that the functional component combination is matched, and recording the current triggering frequency of each functional component;
And calculating the difference between the current trigger frequency of each functional component and the trigger frequency stored in the characteristic behavior database, and obtaining the trigger frequency matching characteristic after averaging the obtained difference values.
Specifically, in step S4, the process of determining the behavior fit characterization coefficient corresponding to the model training terminal according to the comparison result includes,
Calculating the ratio of a preset reference data interaction quantity matching threshold value to a data interaction quantity matching characteristic to obtain a first fitting characterization factor;
Calculating the ratio of a preset reference trigger frequency matching threshold value to a trigger frequency matching characteristic to obtain a second fit characterization factor;
And determining the sum of the first lamination characterization factor and the second lamination characterization factor as a lamination characterization factor.
Specifically, the interaction volume matching threshold is determined based on the data interaction volume average of the port stored in the feature behavior database, and is typically set to between 0.25 and 0.35 times the data interaction volume average.
Specifically, the reference trigger frequency matching threshold is determined based on the trigger frequency of the functional component stored in the characteristic behavior database, and is typically set to between 0.15 times and 0.25 times the trigger frequency.
According to the method, the characteristic fitting characterization coefficients are calculated, the characteristic fitting categories of the model training terminal are constructed, the fitting characterization coefficients characterize the difference between the regularity embodied by the corresponding instruction response data when the model training terminal is accessed and the regularity embodied by the corresponding instruction response data in a normal state, so that whether the current model terminal is in a weak characteristic fitting category can be timely divided, the behavior that a camouflage authorized visitor accesses the model training terminal can be identified, subsequent intervention can be timely made, the data interaction of the model training terminal can be adaptively controlled, and the safety is guaranteed.
Specifically, in step S4, the process of identifying the feature fitting category of the current model training terminal comprises,
If the fit characteristic coefficient is larger than or equal to a preset fit characteristic coefficient threshold value, judging that the current model training terminal is of a strong characteristic fit type;
and if the fit characteristic coefficient is smaller than a preset fit characteristic coefficient threshold, judging that the current model training terminal is of a weak characteristic fit type.
Specifically, the fit-characterizing coefficient threshold is calculated based on the first fit-characterizing factor and the second fit-characterizing factor, it is understood that when the first fit-characterizing factor and the second fit-characterizing factor are close to 1, the characterization tends to an acceptable upper difference limit, and the fit-characterizing coefficient threshold is calculated based on the first fit-characterizing factor and the second fit-characterizing factor, so that, for characterizing the upper difference limit, a person skilled in the art can select the fit-characterizing coefficient threshold within the interval [2.15,2.25 ].
Specifically, in step S5, the data interaction of the training terminal based on the characteristic fitting category control model of the current access behavior comprises,
If the current model training terminal is in a weak feature fitting type, calling the feature behavior database to identify a normal feature port, verifying a data transmission rule of the normal feature port, judging whether to close the corresponding normal feature port, and synchronously sending verification requirement information to the model training terminal so as to judge whether to send an early warning signal according to feedback information of the model training terminal;
And if the current model training terminal is of a strong characteristic fitting type, continuously collecting interaction data of the model training terminal in a period and instruction data corresponding to the instruction input equipment to judge the characteristic fitting type of the model training terminal.
Specifically, in step S5, the process of calling the characteristic behavior database to identify the normal characteristic port comprises,
Arranging each characteristic port combination according to the corresponding first conditional probability descending order to obtain a characteristic port combination sequence;
screening a characteristic port combination sequence number with a preset proportion by the head end of the characteristic port combination sequence;
marking each characteristic port combination based on each characteristic port combination serial number;
identifying ports contained in the marked feature port combinations as normal feature ports;
Each characteristic port combination corresponds to a unique serial number, and the characteristic port combination sequence consists of each unique serial number.
Specifically, in order to screen out the data with stronger characterization of the front end data of the characteristic port combination sequence, the predetermined proportion is selected within 15% to 30%.
Specifically, in step S5, the process of verifying the data transmission rule of the normal feature port and determining whether to close the corresponding normal feature port includes,
Acquiring the current data interaction quantity of each normal characteristic port, respectively constructing a data interaction quantity time domain curve, and determining the normal data interaction characteristics;
Calling a characteristic behavior database to acquire data interaction amounts corresponding to the normal characteristic ports, constructing a data interaction amount sample time domain curve, and determining sample data interaction characteristics;
comparing the normal data interaction characteristics corresponding to the normal characteristic ports with the sample data interaction characteristics to determine characteristic difference quantity;
if the characteristic difference is greater than or equal to a preset characteristic difference threshold, judging that the normal characteristic port is closed;
The constant data interaction characteristic and the sample data interaction characteristic comprise data interaction quantity average amplitude, rising section average slope and falling section average slope.
Specifically, the feature difference amount is determined based on the comparison result of the regular data interaction feature and the sample data interaction feature, and comprises
Determining the difference between the average amplitude of the data interaction quantity in the constant data interaction characteristic and the average amplitude of the data interaction quantity in the sample data interaction characteristic, and solving the ratio of the obtained difference to the average amplitude of the data interaction quantity in the sample data interaction characteristic to obtain a first characteristic difference quantity;
determining the difference between the average slope of the rising segment in the constant data interaction characteristic and the average slope of the rising segment in the sample data interaction characteristic, and solving the ratio of the obtained difference to the average slope of the rising segment in the sample data interaction characteristic to obtain a second characteristic difference;
Determining the difference between the average slope of the descending segment in the constant data interaction characteristic and the average slope of the descending segment in the sample data interaction characteristic, and solving the ratio of the obtained difference to the average slope of the descending segment in the sample data interaction characteristic to obtain a third characteristic difference quantity;
Setting the sum of the first characteristic difference amount, the second characteristic difference amount, and the third characteristic difference amount as a characteristic difference amount;
It can be appreciated that the regular data interaction features and the sample data interaction features can be extracted according to corresponding time domain curves, and will not be described herein.
It can be understood that the data interaction quantity of the normal feature port allows a certain deviation, and the feature difference quantity threshold value is set to represent the situation that the difference is large, so that a person skilled in the art can select the feature difference quantity threshold value within the interval [0.65,0.85 ].
The invention identifies the normal characteristic port, in the actual situation, the normal characteristic port is determined based on the characteristic behavior database, the port combination with higher data interaction frequency is represented in the normal state, the invasion risk is higher, furthermore, the invention judges whether to close the normal characteristic port in time based on the transmission rule of the normal characteristic port, the data interaction quantity and the data interaction quantity change condition of the normal characteristic port are considered in the verification of the transmission rule, in actual situations, if a camouflage operator accesses the model training terminal to steal data, interfere data interaction or cause pollution by calling a data input model, the data input model can be reflected on a transmission rule, so that the invention recognizes the situations, timely closes a transmission port, ensures the safety of the model training terminal and reduces the risk of data leakage.
Specifically, the feedback information includes a key, and if the key does not match a predetermined key, it is determined that an early warning signal is generated.
It will be appreciated that verifying the demand information includes sending a request to the model training terminal to provide a key.
The key may be preset, and may be in an asymmetric encryption or a symmetric encryption mode, so that a person skilled in the art can verify whether the key is in conformity with the corresponding encryption mode by himself, which is not described again.
The generated early warning signal can be sent to a monitoring end for monitoring, which is not described again.
The method for managing large model data leakage of the present invention, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims (10)

1.一种大模型数据泄漏的治理方法,其特征在于,包括:1. A method for managing large model data leakage, characterized by comprising: 获取模型训练终端历史周期内的指令响应数据,包括所述模型训练终端与各端口的数据交互量以及对应指令输入设备的指令触发目标;Obtaining command response data within a historical period of the model training terminal, including the amount of data interaction between the model training terminal and each port and the command trigger target of the corresponding command input device; 基于所述指令响应数据筛选规律特征,以建立针对所述模型训练终端的特征行为数据库,所述规律特征包括特征端口组合以及特征功能组件组合;Screening regular features based on the instruction response data to establish a characteristic behavior database for the model training terminal, wherein the regular features include characteristic port combinations and characteristic function component combinations; 响应于模型训练终端被访问,采集周期内模型训练终端的指令响应数据与特征行为数据库中的数据进行对比;In response to the model training terminal being accessed, the instruction response data of the model training terminal during the acquisition period is compared with the data in the characteristic behavior database; 依据对比结果确定模型训练终端对应的行为贴合表征系数,以识别当前模型训练终端的特征贴合类别;Determine the behavior fit representation coefficient corresponding to the model training terminal according to the comparison result to identify the feature fit category of the current model training terminal; 基于当前访问行为的特征贴合类别控制模型训练终端的数据交互,包括,Based on the characteristics of the current access behavior, the category control model is trained to control the data interaction of the terminal, including: 调用所述特征行为数据库识别常态特征端口,验证常态特征端口的数据传输规律,判定是否关闭对应常态特征端口,同步向模型训练终端发送验证需求信息,以针对模型训练终端的反馈信息判定是否发出预警信号;Calling the characteristic behavior database to identify the normal characteristic port, verifying the data transmission rule of the normal characteristic port, determining whether to close the corresponding normal characteristic port, and synchronously sending verification requirement information to the model training terminal to determine whether to issue an early warning signal based on the feedback information of the model training terminal; 或,持续采集周期内模型训练终端的交互数据以及对应指令输入设备的指令数据判定所述模型训练终端的特征贴合类别。Alternatively, the interaction data of the model training terminal and the instruction data of the corresponding instruction input device within the continuous collection period are used to determine the feature matching category of the model training terminal. 2.根据权利要求1所述的大模型数据泄漏的治理方法,其特征在于,基于所述指令响应数据筛选特征端口组合的过程包括,2. The method for managing large model data leakage according to claim 1, characterized in that the process of screening characteristic port combinations based on the instruction response data includes: 确定模型训练终端在若干次被访问过程中与不同端口组合发生数据交互的第一条件概率;Determine a first conditional probability that a model training terminal interacts with different port combinations during a number of accesses; 基于各端口组合对应的第一条件概率确定特征端口组合,记录所述特征端口组合中各端口数据交互时的数据交互量均值,存储至特征行为数据库;Determine a characteristic port combination based on the first conditional probability corresponding to each port combination, record the average value of the data interaction amount of each port in the characteristic port combination during data interaction, and store it in a characteristic behavior database; 其中,若端口组合对应的第一条件概率大于预定的端口组合条件概率阈值,则判定所述端口组合为特征端口组合。If the first conditional probability corresponding to the port combination is greater than a predetermined port combination conditional probability threshold, the port combination is determined to be a characteristic port combination. 3.根据权利要求1所述的大模型数据泄漏的治理方法,其特征在于,基于所述指令响应数据筛选特征功能组件组合的过程包括,3. The method for managing large model data leakage according to claim 1, characterized in that the process of screening the characteristic function component combination based on the instruction response data includes: 获取模型训练终端在若干次被访问过程中各功能组件组合被触发的第二条件概率;Obtaining a second conditional probability that each combination of functional components is triggered during a number of accesses to the model training terminal; 基于功能组件组合对应的第二条件概率确定特征功能组件组合,记录所述特征功能组件组合中各功能组件的触发频率;Determine a characteristic function component combination based on a second conditional probability corresponding to the function component combination, and record a trigger frequency of each function component in the characteristic function component combination; 其中,若功能组件组合对应的第二条件概率大于预定的功能组件组合概率阈值,则判定当前功能组件组合为特征功能组件组合。If the second conditional probability corresponding to the functional component combination is greater than a predetermined functional component combination probability threshold, the current functional component combination is determined to be a characteristic functional component combination. 4.根据权利要求1所述的大模型数据泄漏的治理方法,其特征在于,采集周期内模型训练终端的指令响应数据与特征行为数据库中的数据进行对比的过程包括,4. The method for controlling large model data leakage according to claim 1 is characterized in that the process of comparing the instruction response data of the model training terminal with the data in the characteristic behavior database during the collection period includes: 记录当前产生数据交互的端口,若当前端口组合在所述特征行为数据库中存在,则确定端口组合匹配,记录各端口对应的当前数据交互量均值;Record the ports that currently generate data interaction. If the current port combination exists in the characteristic behavior database, determine that the port combination matches, and record the average value of the current data interaction volume corresponding to each port; 分别计算各端口对应的当前数据交互量均值与特征行为数据库所存储数据交互量均值之差,将所得各差值求均值后得到数据交互量匹配特征;The difference between the current data interaction volume mean value corresponding to each port and the data interaction volume mean value stored in the characteristic behavior database is calculated respectively, and the data interaction volume matching feature is obtained by averaging the obtained differences; 记录当前被触发的功能组件,若当前功能组件组合在所述特征行为数据库中存在,则确定功能组件组合匹配,记录各功能组件的当前触发频率;Record the currently triggered functional components, if the current functional component combination exists in the characteristic behavior database, determine that the functional component combination matches, and record the current triggering frequency of each functional component; 计算各功能组件的当前触发频率与特征行为数据库所存储触发频率之差,将所得各差值求均值后得到触发频率匹配特征。The difference between the current trigger frequency of each functional component and the trigger frequency stored in the characteristic behavior database is calculated, and the trigger frequency matching feature is obtained by averaging the obtained differences. 5.根据权利要求4所述的大模型数据泄漏的治理方法,其特征在于,依据对比结果确定模型训练终端对应的行为贴合表征系数的过程包括,5. The method for managing large model data leakage according to claim 4 is characterized in that the process of determining the behavior fit characterization coefficient corresponding to the model training terminal according to the comparison result includes: 计算预定的基准数据交互量匹配阈值与数据交互量匹配特征之比,得到第一贴合表征因子;Calculating a ratio of a predetermined benchmark data interaction amount matching threshold to a data interaction amount matching feature to obtain a first fitting characterization factor; 计算预定的基准触发频率匹配阈值与触发频率匹配特征之比,得到第二贴合表征因子;Calculating a ratio of a predetermined reference trigger frequency matching threshold to a trigger frequency matching characteristic to obtain a second fit characterization factor; 将第一贴合表征因子与所述第二贴合表征因子之和确定为贴合表征系数。The sum of the first fitting characterization factor and the second fitting characterization factor is determined as the fitting characterization coefficient. 6.根据权利要求1所述的大模型数据泄漏的治理方法,其特征在于,识别当前模型训练终端的特征贴合类别的过程包括,6. The method for managing large model data leakage according to claim 1 is characterized in that the process of identifying the feature matching category of the current model training terminal includes: 若所述贴合表征系数大于或等于预定的贴合表征系数阈值,则判定当前模型训练终端为强特征贴合类别;If the fitting characterization coefficient is greater than or equal to a predetermined fitting characterization coefficient threshold, the current model training terminal is determined to be a strong feature fitting category; 若所述贴合表征系数小于预定的贴合表征系数阈值,则判定当前模型训练终端为弱特征贴合类别。If the fitting characterization coefficient is less than a predetermined fitting characterization coefficient threshold, the current model training terminal is determined to be a weak feature fitting category. 7.根据权利要求6所述的大模型数据泄漏的治理方法,其特征在于,基于当前访问行为的特征贴合类别控制模型训练终端的数据交互包括,7. The method for managing large model data leakage according to claim 6 is characterized in that the data interaction of the model training terminal is controlled based on the feature matching category of the current access behavior, including: 若当前模型训练终端为弱特征贴合类别,则调用所述特征行为数据库识别常态特征端口,验证常态特征端口的数据传输规律,判定是否关闭对应常态特征端口,同步向模型训练终端发送验证需求信息,以针对模型训练终端的反馈信息判定是否发出预警信号;If the current model training terminal is a weak feature fitting category, the feature behavior database is called to identify the normal feature port, verify the data transmission rule of the normal feature port, determine whether to close the corresponding normal feature port, and synchronously send verification requirement information to the model training terminal to determine whether to issue an early warning signal based on the feedback information of the model training terminal; 若当前模型训练终端为强特征贴合类别,则持续采集周期内模型训练终端的交互数据以及对应指令输入设备的指令数据判定所述模型训练终端的特征贴合类别。If the current model training terminal is a strong feature fitting category, the interaction data of the model training terminal during the continuous collection period and the instruction data of the corresponding instruction input device are used to determine the feature fitting category of the model training terminal. 8.根据权利要求2所述的大模型数据泄漏的治理方法,其特征在于,调用所述特征行为数据库识别常态特征端口的过程包括,8. The method for managing large model data leakage according to claim 2, characterized in that the process of calling the characteristic behavior database to identify normal characteristic ports includes: 将各特征端口组合依据其对应的第一条件概率降序排列得到特征端口组合序列;Arrange each characteristic port combination in descending order according to its corresponding first conditional probability to obtain a characteristic port combination sequence; 由所述特征端口组合序列首端筛选预定比例的特征端口组合序号;A predetermined proportion of characteristic port combination numbers are selected from the head end of the characteristic port combination sequence; 基于各所述特征端口组合序号标记各特征端口组合;Mark each characteristic port combination based on the sequence number of each characteristic port combination; 将被标记特征端口组合中所包含的端口识别为常态特征端口;Identify the ports included in the marked characteristic port combination as normal characteristic ports; 其中,各特征端口组合与唯一序号对应,所述特征端口组合序列由各所述唯一序号构成。Wherein, each characteristic port combination corresponds to a unique serial number, and the characteristic port combination sequence is composed of the unique serial numbers. 9.根据权利要求1所述的大模型数据泄漏的治理方法,其特征在于,验证常态特征端口的数据传输规律,判定是否关闭对应常态特征端口的过程包括,9. The method for managing large model data leakage according to claim 1 is characterized in that the process of verifying the data transmission rules of the normal characteristic port and determining whether to close the corresponding normal characteristic port includes: 获取各常态特征端口的当前数据交互量,分别构建数据交互量时域曲线,确定常时数据交互特征;Obtain the current data interaction volume of each normal characteristic port, construct the data interaction volume time domain curve respectively, and determine the normal data interaction characteristics; 调用特征行为数据库获取各常态特征端口对应的数据交互量,构建数据交互量样本时域曲线,确定样本数据交互特征;Call the characteristic behavior database to obtain the data interaction volume corresponding to each normal characteristic port, construct a sample time domain curve of the data interaction volume, and determine the sample data interaction characteristics; 将常态特征端口对应的常时数据交互特征与样本数据交互特征进行对比,确定特征差异量;Compare the normal data interaction features corresponding to the normal feature port with the sample data interaction features to determine the feature difference amount; 若特征差异量大于或等于预定的特征差异量阈值,则判定关闭所述常态特征端口;If the feature difference is greater than or equal to a predetermined feature difference threshold, it is determined to close the normal feature port; 其中,所述常时数据交互特征以及样本数据交互特征均包括数据交互量平均幅度、上升段平均斜率以及下降段平均斜率。The regular data interaction characteristics and the sample data interaction characteristics both include the average amplitude of the data interaction volume, the average slope of the rising segment, and the average slope of the falling segment. 10.根据权利要求1所述的大模型数据泄漏的治理方法,其特征在于,所述反馈信息包括密钥,若所述密钥与预定密钥不符,则判定生成预警信号。10. The method for managing large model data leakage according to claim 1 is characterized in that the feedback information includes a key, and if the key does not match the predetermined key, it is determined to generate a warning signal.
CN202411686760.2A 2024-11-25 2024-11-25 A governance method for large model data leakage Active CN119203244B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411686760.2A CN119203244B (en) 2024-11-25 2024-11-25 A governance method for large model data leakage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411686760.2A CN119203244B (en) 2024-11-25 2024-11-25 A governance method for large model data leakage

Publications (2)

Publication Number Publication Date
CN119203244A true CN119203244A (en) 2024-12-27
CN119203244B CN119203244B (en) 2025-03-25

Family

ID=94076391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411686760.2A Active CN119203244B (en) 2024-11-25 2024-11-25 A governance method for large model data leakage

Country Status (1)

Country Link
CN (1) CN119203244B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119449454A (en) * 2024-11-28 2025-02-14 北京独创时代科技有限公司 Information security detection management system
CN119922007A (en) * 2025-01-24 2025-05-02 北京和润诚科技有限公司 A data transmission protection method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140098705A1 (en) * 2010-12-30 2014-04-10 Adaptive Spectrum And Signal Alignment, Inc. Management center for communication system customer premises equipment
CN109508485A (en) * 2018-10-30 2019-03-22 平安医疗健康管理股份有限公司 A kind of data processing model dissemination method, device, server and storage medium
CN117633626A (en) * 2023-12-04 2024-03-01 中国建设银行股份有限公司 Model updating method, device and computer equipment
CN118467930A (en) * 2024-07-09 2024-08-09 西安传显行风网络科技有限公司 Abnormal data processing method applied to robot

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140098705A1 (en) * 2010-12-30 2014-04-10 Adaptive Spectrum And Signal Alignment, Inc. Management center for communication system customer premises equipment
CN109508485A (en) * 2018-10-30 2019-03-22 平安医疗健康管理股份有限公司 A kind of data processing model dissemination method, device, server and storage medium
CN117633626A (en) * 2023-12-04 2024-03-01 中国建设银行股份有限公司 Model updating method, device and computer equipment
CN118467930A (en) * 2024-07-09 2024-08-09 西安传显行风网络科技有限公司 Abnormal data processing method applied to robot

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119449454A (en) * 2024-11-28 2025-02-14 北京独创时代科技有限公司 Information security detection management system
CN119449454B (en) * 2024-11-28 2025-10-31 北京独创时代科技有限公司 An information security detection and management system
CN119922007A (en) * 2025-01-24 2025-05-02 北京和润诚科技有限公司 A data transmission protection method
CN119922007B (en) * 2025-01-24 2025-09-12 北京和润诚科技有限公司 Data transmission protection method

Also Published As

Publication number Publication date
CN119203244B (en) 2025-03-25

Similar Documents

Publication Publication Date Title
CN119203244B (en) A governance method for large model data leakage
EP3719678B1 (en) Identity verification method and apparatus
CN116112292B (en) Abnormal behavior detection method, system and medium based on network flow big data
CN110348188B (en) Core body checking method and device
CN110214322A (en) System and method for protecting the access to resource
CN104836781A (en) Method distinguishing identities of access users, and device
CN116915515B (en) Access security control method and system for industrial control network
CN119324835B (en) Intelligent conference system
CN113962712A (en) Method for predicting fraud gangs and related equipment
CN114996746A (en) Data authority management method and system based on multi-dimensional information
KR102230441B1 (en) Method, Device and program for generating security action report based on the results of the security vulnerability assessment
CN120354405A (en) Computer defending system based on Internet of things
CN114733207A (en) Game account monitoring, analyzing, early warning and managing system based on feature analysis
CN120768661A (en) A system and method for network security evaluation based on artificial intelligence
CN118332607B (en) Financial big data analysis system and method based on blockchain
CN120785656B (en) An automated test case construction method and system for testing proprietary protocols
CN119172178B (en) Mobile office equipment remote monitoring management method based on Internet of things
CN120915535A (en) Network information consultation platform based on zero trust architecture and communication method
CN112149036A (en) Method and system for identifying batch abnormal interaction behaviors
CN118551368B (en) A method and system for character instruction intention recognition
CN121037124B (en) Shared data dynamic security management method, system, equipment and medium
CN120105488A (en) A full lifecycle data protection approach
CN120524527A (en) A Binlog-based monitoring method for illegally tampering with system data
CN119102417A (en) Smart door lock password anti-peeping method, system, electronic device and storage medium
CN121327513A (en) Training Methods for Dynamic Tracking Models of User Preferences Applicable to Electronic Signature Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant