CN119203244A

CN119203244A - A governance method for large model data leakage

Info

Publication number: CN119203244A
Application number: CN202411686760.2A
Authority: CN
Inventors: 孙伟; 赵正伟; 周红艳; 刘莹莹; 张垚
Original assignee: Zhonghe Yunke Information Technology Group Co ltd
Current assignee: Zhonghe Yunke Information Technology Group Co ltd
Priority date: 2024-11-25
Filing date: 2024-11-25
Publication date: 2024-12-27
Anticipated expiration: 2044-11-25
Also published as: CN119203244B

Abstract

The invention relates to the field of data management, in particular to a large model data leakage management method, which is characterized in that a characteristic behavior database aiming at a model training terminal is established based on instruction response data screening rule characteristics, when the model training terminal is accessed, a behavior fitting characterization coefficient corresponding to the model training terminal is determined according to comparison of instruction response data acquired in real time and data in the characteristic behavior database, the characteristic fitting type of the current model training terminal is identified, interaction of the model training terminal is controlled in time, and under the weak characteristic fitting type, the data transmission rule of a normal characteristic port is verified, and whether the corresponding normal characteristic port is closed or not is judged.

Description

Large model data leakage treatment method

Technical Field

The invention relates to the field of data management, in particular to a large-model data leakage management method.

Background

With the development of computer technology and internet technology, data models under different architectures are widely applied in various fields, access terminals are widely used in the application process, and various terminals, especially terminals for model training, may contain a large amount of sensitive data, so that terminal security is also important, and technologies related to terminal security and data leakage prevention are gradually paid attention to.

For example, CN114676456A discloses a data privacy protection method, device and storage medium based on edge calculation, and relates to the technical field of information security, wherein the method comprises the steps of acquiring a plurality of training sub-models in a federal machine learning mechanism, wherein each training sub-model is obtained by training an initial model by different end users according to a plurality of original data, and the plurality of original data are acquired different end user data; and according to the model parameters, learning and training the target training sub-model in target edge equipment to obtain a target training model, and detecting whether the data leakage risk exists in each original data according to the target training model. By the application, the problem that the risk of data leakage in the related technology is high is solved.

However, in the prior art, when different terminals are accessed by authorized visitors, due to the habit difference, certain regularity exists in the data dimension of the data interaction of the terminals and the instructions of the corresponding instruction input devices, in the prior art, the regularity is not considered to identify abnormal access in time, so that the security of the terminals is not high, and the data is easy to leak.

Disclosure of Invention

In order to solve the problems that when different terminals are accessed by authorized visitors, due to habit differences, data interaction of the terminals when accessed and instructions of corresponding instruction input equipment always have certain regularity in data dimension, in the prior art, the problem that the safety of the terminals is not high and data is easy to leak due to abnormal access is not recognized in time according to the regularity is solved, the invention provides a treatment method for large-model data leakage, which comprises the following steps:

Acquiring instruction response data in a history period of a model training terminal, wherein the instruction response data comprises data interaction amounts of the model training terminal and each port and instruction triggering targets corresponding to instruction input equipment;

Screening rule features based on the instruction response data to establish a feature behavior database aiming at the model training terminal, wherein the rule features comprise feature port combinations and feature function component combinations;

responding to the accessed model training terminal, and comparing instruction response data of the model training terminal in the acquisition period with data in a characteristic behavior database;

determining a behavior laminating characterization coefficient corresponding to the model training terminal according to the comparison result so as to identify the characteristic laminating category of the current model training terminal;

The feature fit class control model based on current access behavior trains data interactions of the terminal, including,

Invoking the characteristic behavior database to identify a normal characteristic port, verifying a data transmission rule of the normal characteristic port, judging whether to close the corresponding normal characteristic port, and synchronously sending verification requirement information to the model training terminal so as to judge whether to send an early warning signal according to feedback information of the model training terminal;

or continuously collecting interactive data of the model training terminal in a period and judging the characteristic fitting type of the model training terminal according to instruction data of the corresponding instruction input equipment.

Further, the process of screening feature port combinations based on the instruction response data includes,

Determining a first conditional probability of data interaction between the model training terminal and different port combinations in a plurality of accessed processes;

Determining a characteristic port combination based on a first conditional probability corresponding to each port combination, recording a data interaction quantity average value of each port data interaction in the characteristic port combination, and storing the data interaction quantity average value in a characteristic behavior database;

And if the first conditional probability corresponding to the port combination is greater than a predetermined port combination conditional probability threshold, judging the port combination as a characteristic port combination.

Further, the process of screening feature function component combinations based on the instruction response data includes

Acquiring a second conditional probability that each functional component combination is triggered in the process of accessing the model training terminal for a plurality of times;

Determining a characteristic functional component combination based on a second conditional probability corresponding to the functional component combination, and recording the triggering frequency of each functional component in the characteristic functional component combination;

And if the second conditional probability corresponding to the function component combination is greater than the predetermined function component combination probability threshold, determining that the current function component combination is the characteristic function component combination.

Further, the process of comparing the instruction response data of the model training terminal in the acquisition period with the data in the characteristic behavior database comprises,

Recording ports generating data interaction currently, if the current port combination exists in the characteristic behavior database, determining that the port combination is matched, and recording the current data interaction quantity average value corresponding to each port;

respectively calculating the difference between the current data interaction quantity average value corresponding to each port and the data interaction quantity average value stored in the characteristic behavior database, and obtaining the data interaction quantity matching characteristic after averaging the obtained difference values;

Recording the current triggered functional components, if the current functional component combination exists in the characteristic behavior database, determining that the functional component combination is matched, and recording the current triggering frequency of each functional component;

And calculating the difference between the current trigger frequency of each functional component and the trigger frequency stored in the characteristic behavior database, and obtaining the trigger frequency matching characteristic after averaging the obtained difference values.

Further, the process of determining the behavior fit characterization coefficient corresponding to the model training terminal according to the comparison result comprises,

Calculating the ratio of a preset reference data interaction quantity matching threshold value to a data interaction quantity matching characteristic to obtain a first fitting characterization factor;

Calculating the ratio of a preset reference trigger frequency matching threshold value to a trigger frequency matching characteristic to obtain a second fit characterization factor;

And determining the sum of the first lamination characterization factor and the second lamination characterization factor as a lamination characterization factor.

Further, the process of identifying the feature fit category of the current model training terminal comprises,

If the fit characteristic coefficient is larger than or equal to a preset fit characteristic coefficient threshold value, judging that the current model training terminal is of a strong characteristic fit type;

and if the fit characteristic coefficient is smaller than a preset fit characteristic coefficient threshold, judging that the current model training terminal is of a weak characteristic fit type.

Further, the data interaction of the model training terminal based on the characteristic fitting category control of the current access behavior comprises,

If the current model training terminal is in a weak feature fitting type, calling the feature behavior database to identify a normal feature port, verifying a data transmission rule of the normal feature port, judging whether to close the corresponding normal feature port, and synchronously sending verification requirement information to the model training terminal so as to judge whether to send an early warning signal according to feedback information of the model training terminal;

And if the current model training terminal is of a strong characteristic fitting type, continuously collecting interaction data of the model training terminal in a period and instruction data corresponding to the instruction input equipment to judge the characteristic fitting type of the model training terminal.

Further, the process of calling the characteristic behavior database to identify the normal characteristic port comprises,

Arranging each characteristic port combination according to the corresponding first conditional probability descending order to obtain a characteristic port combination sequence;

screening a characteristic port combination sequence number with a preset proportion by the head end of the characteristic port combination sequence;

marking each characteristic port combination based on each characteristic port combination serial number;

identifying ports contained in the marked feature port combinations as normal feature ports;

Each characteristic port combination corresponds to a unique serial number, and the characteristic port combination sequence consists of each unique serial number.

Further, verifying the data transmission rule of the normal feature port, determining whether to close the corresponding normal feature port includes,

Acquiring the current data interaction quantity of each normal characteristic port, respectively constructing a data interaction quantity time domain curve, and determining the normal data interaction characteristics;

Calling a characteristic behavior database to acquire data interaction amounts corresponding to the normal characteristic ports, constructing a data interaction amount sample time domain curve, and determining sample data interaction characteristics;

comparing the normal data interaction characteristics corresponding to the normal characteristic ports with the sample data interaction characteristics to determine characteristic difference quantity;

if the characteristic difference is greater than or equal to a preset characteristic difference threshold, judging that the normal characteristic port is closed;

The constant data interaction characteristic and the sample data interaction characteristic comprise data interaction quantity average amplitude, rising section average slope and falling section average slope.

Further, the feedback information comprises a key, and if the key is not matched with a preset key, the generation of the early warning signal is judged.

Compared with the prior art, the method and the device have the advantages that the characteristic behavior database aiming at the model training terminal is established based on the rule characteristics of the instruction response data, when the model training terminal is accessed, the behavior fitting characterization coefficients corresponding to the model training terminal are determined according to the comparison of the instruction response data acquired in real time and the data in the characteristic behavior database, the characteristic fitting type of the current model training terminal is identified, the interaction of the model training terminal is controlled in time, the data transmission rule of the normal characteristic port is verified under the weak characteristic fitting type, and whether the corresponding normal characteristic port is closed or not is judged.

In particular, the method and the device establish the characteristic behavior database aiming at the model training terminal based on the rule characteristics of the instruction response data, in the actual situation, due to the operation habit of an authorized operator, when the model training terminal is accessed for a long time, certain regularity exists in the instructions of the corresponding instruction input equipment, for example, partial functional components of the model training terminal are intensively triggered, and partial functional components are freshly triggered, so that the model training terminal also shows regularity in data interaction with each port, for example, data interaction with a specific port is generated, and the data interaction quantity is in a certain range, therefore, the characteristic port combination and the characteristic functional component combination are considered as rule characteristics, the special characteristic behavior database is constructed, data support is provided for the follow-up identification characteristic attaching category, the data interaction of the model training terminal is controlled adaptively, the safety is ensured, and the risk of data leakage is reduced.

In particular, the invention calculates the behavior laminating characterization coefficient, builds the characteristic laminating category of the model training terminal, and the laminating characterization coefficient characterizes the difference between the regularity reflected by the corresponding instruction response data when the model training terminal is accessed and the regularity reflected by the corresponding instruction response data in a normal state, so that whether the current model terminal is in a weak characteristic laminating category can be timely divided, the behavior of a camouflage authorized visitor for accessing the model training terminal can be identified, the subsequent intervention can be timely made, the data interaction of the model training terminal can be adaptively controlled, and the safety can be ensured.

In particular, the invention identifies the normal characteristic port, in the actual situation, the normal characteristic port is determined based on the characteristic behavior database, the port combination with higher data interaction frequency is represented in the normal state, and the invasion risk is higher, furthermore, the invention judges whether to close the normal characteristic port in time based on the transmission rule of the normal characteristic port, the data interaction quantity and the change condition of the data interaction quantity of the normal characteristic port are considered in the verification of the transmission rule, in the actual situation, if a camouflage operator accesses a model training terminal to steal data, interfere data interaction or cause pollution by calling a data input model, the invention can embody the transmission rule, therefore, the invention identifies the situation, closes the transmission port in time, ensures the safety of the model training terminal, and reduces the risk of data leakage.

Drawings

FIG. 1 is a schematic diagram of the steps of a method for managing large model data leakage according to an embodiment of the invention;

FIG. 2 is a logic decision diagram for identifying feature fit categories of a current model training terminal according to an embodiment of the invention;

FIG. 3 is a logic decision diagram of an embodiment of the present invention for determining whether to close a corresponding normal feature port;

FIG. 4 is a logic diagram of an embodiment of the present invention for determining whether to issue an early warning signal.

Detailed Description

The invention will be further described with reference to examples for the purpose of making the objects and advantages of the invention more apparent, it being understood that the specific examples described herein are given by way of illustration only and are not intended to be limiting.

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.

Referring to fig. 1 to 4, fig. 1 is a schematic diagram illustrating steps of a method for managing large model data leakage according to an embodiment of the present invention, fig. 2 is a logic determination chart for identifying a feature fitting type of a training terminal of a current model according to an embodiment of the present invention, fig. 3 is a logic determination chart for determining whether to close a corresponding normal feature port according to an embodiment of the present invention, and fig. 4 is a logic determination chart for determining whether to send an early warning signal according to an embodiment of the present invention, where the method for managing large model data leakage according to the present embodiment includes:

S1, acquiring instruction response data in a history period of a model training terminal, wherein the instruction response data comprises data interaction amounts of the model training terminal and each port and instruction triggering targets corresponding to instruction input equipment;

s2, screening regular features based on the instruction response data to establish a feature behavior database aiming at the model training terminal, wherein the regular features comprise feature port combinations and feature function component combinations;

S3, responding to the mode training terminal being accessed, and acquiring instruction response data of the mode training terminal in a period to compare with data in a characteristic behavior database;

S4, determining a behavior laminating characterization coefficient corresponding to the model training terminal according to the comparison result so as to identify the characteristic laminating category of the current model training terminal;

s5, controlling data interaction of the model training terminal based on the characteristic fitting category of the current access behavior, comprising,

Specifically, the specific structure of the model training terminal is not limited, and the model training terminal is a logic component capable of executing logic operation, and only needs to realize the function of loading or storing the model for training, for example, a computer, which is not described again.

Specifically, in implementation, the instruction input device is connected with the model training terminal in a matching manner, so as to input an instruction to the model training terminal to realize a corresponding function, such as a keyboard, a mouse, and the like.

Specifically, for the functional component, the functional component is configured in the model training terminal to implement a corresponding function, for example, a certain software in the model training terminal, and generally, the functional component can be triggered by the instruction input device through a control instruction to implement the corresponding function, which is not described herein.

Specifically, the model training terminal is accessed including a visitor accessing the model training terminal, the model training terminal being controlled by the instruction input device.

Specifically, in a computer network, ports refer to logical communication endpoints that are used to distinguish between different network services or processes running on the terminal, each port being identified by a unique port number, which is a 16-bit number ranging from 0 to 65535, and will not be described in detail.

Specifically, in step S2, the process of screening feature port combinations based on the instruction response data includes,

Determining a first conditional probability of data interaction between the model training terminal and different port combinations in a plurality of accessed processes, wherein it can be understood that single access comprises the step of starting to access the model training terminal to stop to access the model training terminal by a visitor;

it can be appreciated that the model training terminal may interact with different ports during a single access due to different control instructions,

Taking a total of 5 ports as an example, the port serial numbers are 1, 2, 3, 4 and 5 respectively;

If the current port combination is 1,2 and 3, the model training terminal is accessed 10 times, wherein the model training terminal is accessed 3 times under the current port combination, the first conditional probability corresponding to the port combination is 3/10, namely the ratio of the number of times the model training terminal is accessed under the current port combination to the total number of times the model training terminal is accessed;

Specifically, the port combination conditional probability threshold is determined based on the first conditional probability average value of each port combination, and is set to be the product of the conditional probability average value and the precision coefficient, and the precision coefficient is selected within the interval [1.05,1.15 ].

Specifically, in step S2, the process of screening feature function component combinations based on the instruction response data includes

It can be understood that the second conditional probability is the ratio of the number of times the model training terminal is accessed to the total number of times the model training terminal is accessed under the combination of the functional components;

And determining a characteristic functional component combination based on a second conditional probability corresponding to the functional component combination, and recording the triggering frequency of each functional component in the characteristic functional component combination, wherein the triggering frequency is the number of times triggered in a reference time, and the reference time can be selected from 3min to 10min in order to enable the acquired instruction response data to have data characterizability.

Specifically, in implementation, the functional component combination probability threshold is determined based on a second conditional probability average that each functional component combination is triggered, and is set to be a product of the second conditional probability average and the precision coefficient.

According to the method, the characteristic behavior database aiming at the model training terminal is established based on the rule characteristics of the instruction response data, in the practical situation, due to the operation habit of an authorized operator, when the model training terminal is accessed for a long time, certain regularity exists in the instructions of the corresponding instruction input equipment, for example, partial functional components of partial model training terminals are intensively triggered, and partial functional components are freshly triggered, so that the model training terminal also shows regularity in data interaction with each port, for example, data interaction with a specific port is generated, and the data interaction amount is in a certain range, therefore, the characteristic port combination and the characteristic functional component combination are considered as rule characteristics, the special characteristic behavior database is constructed, data support is provided for the follow-up identification characteristic attaching category, the data interaction of the model training terminal is controlled in an adaptive mode, the safety is ensured, and the risk of data leakage is reduced.

Specifically, in step S3, the process of comparing the instruction response data of the model training terminal in the acquisition period with the data in the characteristic behavior database comprises,

Recording a current port generating data interaction, if the current port combination exists in the characteristic behavior database, determining port combination matching, and recording a current data interaction quantity average value corresponding to each port, wherein it can be understood that the data interaction quantity of each port in each time period in a certain time can be synchronously recorded, and then the data interaction quantity average value is solved;

Specifically, in step S4, the process of determining the behavior fit characterization coefficient corresponding to the model training terminal according to the comparison result includes,

Specifically, the interaction volume matching threshold is determined based on the data interaction volume average of the port stored in the feature behavior database, and is typically set to between 0.25 and 0.35 times the data interaction volume average.

Specifically, the reference trigger frequency matching threshold is determined based on the trigger frequency of the functional component stored in the characteristic behavior database, and is typically set to between 0.15 times and 0.25 times the trigger frequency.

According to the method, the characteristic fitting characterization coefficients are calculated, the characteristic fitting categories of the model training terminal are constructed, the fitting characterization coefficients characterize the difference between the regularity embodied by the corresponding instruction response data when the model training terminal is accessed and the regularity embodied by the corresponding instruction response data in a normal state, so that whether the current model terminal is in a weak characteristic fitting category can be timely divided, the behavior that a camouflage authorized visitor accesses the model training terminal can be identified, subsequent intervention can be timely made, the data interaction of the model training terminal can be adaptively controlled, and the safety is guaranteed.

Specifically, in step S4, the process of identifying the feature fitting category of the current model training terminal comprises,

Specifically, the fit-characterizing coefficient threshold is calculated based on the first fit-characterizing factor and the second fit-characterizing factor, it is understood that when the first fit-characterizing factor and the second fit-characterizing factor are close to 1, the characterization tends to an acceptable upper difference limit, and the fit-characterizing coefficient threshold is calculated based on the first fit-characterizing factor and the second fit-characterizing factor, so that, for characterizing the upper difference limit, a person skilled in the art can select the fit-characterizing coefficient threshold within the interval [2.15,2.25 ].

Specifically, in step S5, the data interaction of the training terminal based on the characteristic fitting category control model of the current access behavior comprises,

Specifically, in step S5, the process of calling the characteristic behavior database to identify the normal characteristic port comprises,

Specifically, in order to screen out the data with stronger characterization of the front end data of the characteristic port combination sequence, the predetermined proportion is selected within 15% to 30%.

Specifically, in step S5, the process of verifying the data transmission rule of the normal feature port and determining whether to close the corresponding normal feature port includes,

Specifically, the feature difference amount is determined based on the comparison result of the regular data interaction feature and the sample data interaction feature, and comprises

Determining the difference between the average amplitude of the data interaction quantity in the constant data interaction characteristic and the average amplitude of the data interaction quantity in the sample data interaction characteristic, and solving the ratio of the obtained difference to the average amplitude of the data interaction quantity in the sample data interaction characteristic to obtain a first characteristic difference quantity;

determining the difference between the average slope of the rising segment in the constant data interaction characteristic and the average slope of the rising segment in the sample data interaction characteristic, and solving the ratio of the obtained difference to the average slope of the rising segment in the sample data interaction characteristic to obtain a second characteristic difference;

Determining the difference between the average slope of the descending segment in the constant data interaction characteristic and the average slope of the descending segment in the sample data interaction characteristic, and solving the ratio of the obtained difference to the average slope of the descending segment in the sample data interaction characteristic to obtain a third characteristic difference quantity;

Setting the sum of the first characteristic difference amount, the second characteristic difference amount, and the third characteristic difference amount as a characteristic difference amount;

It can be appreciated that the regular data interaction features and the sample data interaction features can be extracted according to corresponding time domain curves, and will not be described herein.

It can be understood that the data interaction quantity of the normal feature port allows a certain deviation, and the feature difference quantity threshold value is set to represent the situation that the difference is large, so that a person skilled in the art can select the feature difference quantity threshold value within the interval [0.65,0.85 ].

The invention identifies the normal characteristic port, in the actual situation, the normal characteristic port is determined based on the characteristic behavior database, the port combination with higher data interaction frequency is represented in the normal state, the invasion risk is higher, furthermore, the invention judges whether to close the normal characteristic port in time based on the transmission rule of the normal characteristic port, the data interaction quantity and the data interaction quantity change condition of the normal characteristic port are considered in the verification of the transmission rule, in actual situations, if a camouflage operator accesses the model training terminal to steal data, interfere data interaction or cause pollution by calling a data input model, the data input model can be reflected on a transmission rule, so that the invention recognizes the situations, timely closes a transmission port, ensures the safety of the model training terminal and reduces the risk of data leakage.

Specifically, the feedback information includes a key, and if the key does not match a predetermined key, it is determined that an early warning signal is generated.

It will be appreciated that verifying the demand information includes sending a request to the model training terminal to provide a key.

The key may be preset, and may be in an asymmetric encryption or a symmetric encryption mode, so that a person skilled in the art can verify whether the key is in conformity with the corresponding encryption mode by himself, which is not described again.

The generated early warning signal can be sent to a monitoring end for monitoring, which is not described again.

The method for managing large model data leakage of the present invention, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims

1. A method for managing large model data leakage, characterized by comprising:

Obtaining command response data within a historical period of the model training terminal, including the amount of data interaction between the model training terminal and each port and the command trigger target of the corresponding command input device;

Screening regular features based on the instruction response data to establish a characteristic behavior database for the model training terminal, wherein the regular features include characteristic port combinations and characteristic function component combinations;

In response to the model training terminal being accessed, the instruction response data of the model training terminal during the acquisition period is compared with the data in the characteristic behavior database;

Determine the behavior fit representation coefficient corresponding to the model training terminal according to the comparison result to identify the feature fit category of the current model training terminal;

Based on the characteristics of the current access behavior, the category control model is trained to control the data interaction of the terminal, including:

Calling the characteristic behavior database to identify the normal characteristic port, verifying the data transmission rule of the normal characteristic port, determining whether to close the corresponding normal characteristic port, and synchronously sending verification requirement information to the model training terminal to determine whether to issue an early warning signal based on the feedback information of the model training terminal;

Alternatively, the interaction data of the model training terminal and the instruction data of the corresponding instruction input device within the continuous collection period are used to determine the feature matching category of the model training terminal.

2. The method for managing large model data leakage according to claim 1, characterized in that the process of screening characteristic port combinations based on the instruction response data includes:

Determine a first conditional probability that a model training terminal interacts with different port combinations during a number of accesses;

Determine a characteristic port combination based on the first conditional probability corresponding to each port combination, record the average value of the data interaction amount of each port in the characteristic port combination during data interaction, and store it in a characteristic behavior database;

If the first conditional probability corresponding to the port combination is greater than a predetermined port combination conditional probability threshold, the port combination is determined to be a characteristic port combination.

3. The method for managing large model data leakage according to claim 1, characterized in that the process of screening the characteristic function component combination based on the instruction response data includes:

Obtaining a second conditional probability that each combination of functional components is triggered during a number of accesses to the model training terminal;

Determine a characteristic function component combination based on a second conditional probability corresponding to the function component combination, and record a trigger frequency of each function component in the characteristic function component combination;

If the second conditional probability corresponding to the functional component combination is greater than a predetermined functional component combination probability threshold, the current functional component combination is determined to be a characteristic functional component combination.

4. The method for controlling large model data leakage according to claim 1 is characterized in that the process of comparing the instruction response data of the model training terminal with the data in the characteristic behavior database during the collection period includes:

Record the ports that currently generate data interaction. If the current port combination exists in the characteristic behavior database, determine that the port combination matches, and record the average value of the current data interaction volume corresponding to each port;

The difference between the current data interaction volume mean value corresponding to each port and the data interaction volume mean value stored in the characteristic behavior database is calculated respectively, and the data interaction volume matching feature is obtained by averaging the obtained differences;

Record the currently triggered functional components, if the current functional component combination exists in the characteristic behavior database, determine that the functional component combination matches, and record the current triggering frequency of each functional component;

The difference between the current trigger frequency of each functional component and the trigger frequency stored in the characteristic behavior database is calculated, and the trigger frequency matching feature is obtained by averaging the obtained differences.

5. The method for managing large model data leakage according to claim 4 is characterized in that the process of determining the behavior fit characterization coefficient corresponding to the model training terminal according to the comparison result includes:

Calculating a ratio of a predetermined benchmark data interaction amount matching threshold to a data interaction amount matching feature to obtain a first fitting characterization factor;

Calculating a ratio of a predetermined reference trigger frequency matching threshold to a trigger frequency matching characteristic to obtain a second fit characterization factor;

The sum of the first fitting characterization factor and the second fitting characterization factor is determined as the fitting characterization coefficient.

6. The method for managing large model data leakage according to claim 1 is characterized in that the process of identifying the feature matching category of the current model training terminal includes:

If the fitting characterization coefficient is greater than or equal to a predetermined fitting characterization coefficient threshold, the current model training terminal is determined to be a strong feature fitting category;

If the fitting characterization coefficient is less than a predetermined fitting characterization coefficient threshold, the current model training terminal is determined to be a weak feature fitting category.

7. The method for managing large model data leakage according to claim 6 is characterized in that the data interaction of the model training terminal is controlled based on the feature matching category of the current access behavior, including:

If the current model training terminal is a weak feature fitting category, the feature behavior database is called to identify the normal feature port, verify the data transmission rule of the normal feature port, determine whether to close the corresponding normal feature port, and synchronously send verification requirement information to the model training terminal to determine whether to issue an early warning signal based on the feedback information of the model training terminal;

If the current model training terminal is a strong feature fitting category, the interaction data of the model training terminal during the continuous collection period and the instruction data of the corresponding instruction input device are used to determine the feature fitting category of the model training terminal.

8. The method for managing large model data leakage according to claim 2, characterized in that the process of calling the characteristic behavior database to identify normal characteristic ports includes:

Arrange each characteristic port combination in descending order according to its corresponding first conditional probability to obtain a characteristic port combination sequence;

A predetermined proportion of characteristic port combination numbers are selected from the head end of the characteristic port combination sequence;

Mark each characteristic port combination based on the sequence number of each characteristic port combination;

Identify the ports included in the marked characteristic port combination as normal characteristic ports;

Wherein, each characteristic port combination corresponds to a unique serial number, and the characteristic port combination sequence is composed of the unique serial numbers.

9. The method for managing large model data leakage according to claim 1 is characterized in that the process of verifying the data transmission rules of the normal characteristic port and determining whether to close the corresponding normal characteristic port includes:

Obtain the current data interaction volume of each normal characteristic port, construct the data interaction volume time domain curve respectively, and determine the normal data interaction characteristics;

Call the characteristic behavior database to obtain the data interaction volume corresponding to each normal characteristic port, construct a sample time domain curve of the data interaction volume, and determine the sample data interaction characteristics;

Compare the normal data interaction features corresponding to the normal feature port with the sample data interaction features to determine the feature difference amount;

If the feature difference is greater than or equal to a predetermined feature difference threshold, it is determined to close the normal feature port;

The regular data interaction characteristics and the sample data interaction characteristics both include the average amplitude of the data interaction volume, the average slope of the rising segment, and the average slope of the falling segment.

10. The method for managing large model data leakage according to claim 1 is characterized in that the feedback information includes a key, and if the key does not match the predetermined key, it is determined to generate a warning signal.