[go: up one dir, main page]

CN117474006A - Data security protection integrated system and method - Google Patents

Data security protection integrated system and method Download PDF

Info

Publication number
CN117474006A
CN117474006A CN202311376336.3A CN202311376336A CN117474006A CN 117474006 A CN117474006 A CN 117474006A CN 202311376336 A CN202311376336 A CN 202311376336A CN 117474006 A CN117474006 A CN 117474006A
Authority
CN
China
Prior art keywords
data
feature
semantic
feature vector
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202311376336.3A
Other languages
Chinese (zh)
Inventor
翁武焰
何颖
吴慧明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Zhongxin Wang 'an Information Technology Co ltd
Original Assignee
Fujian Zhongxin Wang 'an Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Zhongxin Wang 'an Information Technology Co ltd filed Critical Fujian Zhongxin Wang 'an Information Technology Co ltd
Priority to CN202311376336.3A priority Critical patent/CN117474006A/en
Publication of CN117474006A publication Critical patent/CN117474006A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

本申请涉及智能防护技术领域,其具体地公开了一种数据安全防护一体化系统及其方法,其采用基于深度学习的人工智能检测算法,提取出待检测数据和敏感词汇的特征信息,再进一步计算所述待检测数据特征和所述敏感词汇特征之间的转移矩阵来表示两者的特征相似度,以此来判断该数据是否为敏感数据。这样,能够自动化地对大量数据进行处理,提高了敏感数据识别准确性和效率。

This application relates to the field of intelligent protection technology. It specifically discloses an integrated data security protection system and its method. It uses an artificial intelligence detection algorithm based on deep learning to extract the characteristic information of the data to be detected and sensitive words, and then further Calculate the transfer matrix between the characteristics of the data to be detected and the characteristics of the sensitive vocabulary to represent the similarity of the two characteristics, thereby determining whether the data is sensitive data. In this way, large amounts of data can be processed automatically, improving the accuracy and efficiency of sensitive data identification.

Description

Data security protection integrated system and method thereof
Technical Field
The application relates to the technical field of intelligent protection, and more particularly, to a data security protection integrated system and a method thereof.
Background
With the rapid development of the internet, data security is getting more and more important, and sensitive data identification has important significance in data security, and the main purpose of the data security identification is to identify and mark sensitive data stored in a system so as to take corresponding security measures to protect the data. Sensitive data refers to data that may cause serious harm to society or individuals after leakage. Meanwhile, the sensitive data is also called privacy data, and comprises all information which is not disclosed or classified, including personal privacy data such as names, identification card numbers, addresses, telephones, bank account numbers, mailboxes, passwords, medical information, educational backgrounds and the like; and enterprise private information such as business conditions of the enterprise, customer information, business secrets, etc.
By means of sensitive data identification, these data can be found and marked in time, so that appropriate security measures are taken to prevent access, leakage or abuse by unauthorized persons. Because most of the current data have the characteristics of large capacity and complexity, the traditional manual carding speed is low, and different people can judge the same data differently, so that the result generated by the sensitive data in the identification process is different.
Accordingly, a data security integrated system and method thereof are desired.
Disclosure of Invention
The present application has been made in order to solve the above technical problems. The embodiment of the application provides a data safety protection integrated system and a method thereof, which adopt an artificial intelligent detection algorithm based on deep learning to extract characteristic information of data to be detected and sensitive words, and further calculate a transfer matrix between the characteristics of the data to be detected and the characteristics of the sensitive words to represent the characteristic similarity of the data to be detected and the characteristics of the sensitive words so as to judge whether the data is sensitive data or not. Thus, a large amount of data can be automatically processed, and the accuracy and efficiency of sensitive data identification are improved.
Accordingly, according to one aspect of the present application, there is provided a data security integrated system comprising:
The data acquisition module is used for acquiring data to be detected and a sensitive vocabulary set;
the data to be detected semantic understanding module is used for enabling the data to be detected to pass through a context encoder comprising an embedded layer to obtain a plurality of context semantic feature vectors;
the first scale perception module is used for cascading the context semantic feature vectors to obtain a first scale semantic association feature vector;
the second scale perception module is used for two-dimensionally arranging the context semantic feature vectors into context semantic feature matrixes and then obtaining second scale semantic association feature vectors through a convolutional neural network model comprising a plurality of mixed convolutional layers;
the multi-scale fusion module is used for carrying out interpolation order fusion on the first-scale semantic association feature vector and the second-scale semantic association feature vector so as to obtain a data feature vector to be detected;
the sensitive vocabulary semantic understanding module is used for enabling the sensitive vocabulary set to pass through a context encoder comprising an embedded layer to obtain sensitive data feature vectors;
the transfer calculation module is used for calculating a transfer matrix between the data characteristic vector to be detected and the sensitive data characteristic vector to be detected as a classification characteristic matrix;
And the detection result generation module is used for passing the classification characteristic matrix through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the detection data are sensitive data or not.
In the above data security protection integrated system, the data semantic understanding module to be detected includes: an embedding unit, configured to map text data of each data item in the data to be detected into word embedding vectors by using an embedding layer of the context encoder; a data adding unit, configured to add numerical data in each data item to the tail of the word embedding vector of each data item to obtain a plurality of data item embedding vectors; and the context coding unit is used for carrying out context semantic coding on the plurality of data item embedded vectors by using a Bert model based on a converter of the context coder so as to obtain a plurality of context semantic feature vectors.
In the above data security protection integrated system, the context encoding unit includes: a one-dimensional arrangement subunit, configured to perform one-dimensional arrangement on the plurality of data item embedding vectors to obtain a data item global embedding vector; a self-attention generation subunit, configured to calculate a product between the data item global embedding vector and a transpose vector of each of the plurality of data item embedding vectors to obtain a plurality of self-attention correlation matrices; the standardized self-attention subunit is used for respectively carrying out standardized processing on each self-attention incidence matrix in the plurality of self-attention incidence matrices to obtain a plurality of standardized self-attention incidence matrices; the weight generation subunit is used for obtaining a plurality of probability values from each normalized self-attention correlation matrix in the normalized self-attention correlation matrices through a classification function; and the weighting subunit is used for weighting each data item embedded vector in the plurality of data item embedded vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of context semantic feature vectors.
In the above data security protection integrated system, the second scale sensing module is configured to: each mixed convolution layer using the convolutional neural network model performs respective processing on input data in forward transfer of the layer: performing convolutional encoding on the context semantic feature matrix by using a first convolutional check with a first size to obtain a first scale feature map; performing convolutional encoding on the context semantic feature matrix by using a second convolutional check with the first void ratio to obtain a second scale feature map; performing convolutional encoding on the context semantic feature matrix by using a third convolutional check with a second void ratio to obtain a third scale feature map; performing convolutional encoding on the context semantic feature matrix by using a fourth convolution kernel with a third void fraction to obtain a fourth scale feature map, wherein the first convolution kernel, the second convolution kernel, the third convolution kernel and the fourth convolution kernel have the same size, and the second convolution kernel, the third convolution kernel and the fourth convolution kernel have different void fractions; performing aggregation on the first scale feature map, the second scale feature map, the third scale feature map and the fourth scale feature map along a channel dimension to obtain an aggregated feature map; global pooling processing is carried out on each feature matrix along the channel dimension on the aggregate feature map so as to generate a pooled feature map; performing activation processing on the pooled feature map to generate an activated feature map; wherein the output of the last layer of the convolutional neural network model comprising a plurality of mixed convolutional layers is the second-scale semantically-related feature vector.
In the above data security protection integrated system, the multi-scale fusion module includes: the difference calculation unit is used for calculating the position-based difference between the first-scale semantic association feature vector and the second-scale semantic association feature vector to obtain a difference feature vector; the per-position weighting unit is used for calculating per-position weighting between the first-scale semantic association feature vector and the second-scale semantic association feature vector to obtain a point-added feature vector; the cosine similarity calculation unit is used for calculating cosine similarity between the differential feature vector and the point-added feature vector; the weighted fusion unit is used for taking cosine similarity between the differential feature vector and the point-added feature vector as a weight parameter, and fusing the first-scale semantic association feature vector and the second-scale semantic association feature vector by the following fusion formula to obtain the data feature vector to be detected; wherein, the fusion formula is: v (V) i =αV 1 +(1-α)V 2 ,V 1 Representing the first scale semantically associated feature vector, V 2 Representing the second scale semantically-related feature vector, alpha representing a weight parameter, V i Representing the feature vector of the data to be detected.
In the above data security protection integrated system, the sensitive vocabulary semantic understanding module includes: the embedded vectorization unit is used for mapping each sensitive vocabulary in the sensitive vocabulary set into a word embedded vector by using an embedded layer of the context encoder to obtain a sequence of the word embedded vector; a semantic coding unit, configured to perform global-based context semantic coding on the sequence of word embedding vectors using a Bert model based on a converter of the context encoder to obtain a plurality of word feature vectors; and the cascading unit is used for cascading the word characteristic vectors to obtain the sensitive data characteristic vector.
In the above data security protection integrated system, the transfer calculation module is configured to: calculating a transfer matrix between the data feature vector to be detected and the sensitive data feature vector according to the following transfer formula;
wherein, the transfer formula is:
wherein V is a Representing the feature vector of the data to be detected, V b Representing the sensitive data feature vector, M representing the transfer matrix,the representation matrix is multiplied by the vector.
According to another aspect of the present application, there is provided a data security protection integration method, including:
Acquiring data to be detected and a sensitive vocabulary set;
passing the data to be detected through a context encoder comprising an embedded layer to obtain a plurality of context semantic feature vectors;
cascading the context semantic feature vectors to obtain a first-scale semantic association feature vector;
two-dimensionally arranging the context semantic feature vectors into a context semantic feature matrix, and then obtaining a second-scale semantic association feature vector through a convolutional neural network model comprising a plurality of mixed convolutional layers;
carrying out interpolation order fusion on the first-scale semantic association feature vector and the second-scale semantic association feature vector to obtain a data feature vector to be detected;
passing the sensitive vocabulary set through a context encoder comprising an embedded layer to obtain a sensitive data feature vector;
calculating a transfer matrix between the data feature vector to be detected and the sensitive data feature vector as a classification feature matrix;
and the classification feature matrix passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the detection data are sensitive data or not.
In the above data security protection integrated method, the step of passing the data to be detected through a context encoder including an embedded layer to obtain a plurality of context semantic feature vectors includes: using an embedding layer of the context encoder to map text data of each data item in the data to be detected into word embedding vectors respectively; respectively adding numerical data in each data item to the tail part of the word embedding vector of each data item to obtain a plurality of data item embedding vectors; and performing context semantic coding on the plurality of data item embedded vectors by using a converter-based Bert model of the context encoder to obtain a plurality of context semantic feature vectors.
In the above data security protection integration method, performing context semantic encoding on the plurality of data item embedded vectors using a converter-based Bert model of the context encoder to obtain the plurality of context semantic feature vectors, including: one-dimensional arrangement is carried out on the plurality of data item embedded vectors to obtain a data item global embedded vector; calculating the product between the data item global embedded vector and the transpose vector of each data item embedded vector in the plurality of data item embedded vectors to obtain a plurality of self-attention association matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; each normalized self-attention correlation matrix in the normalized self-attention correlation matrices is subjected to a classification function to obtain a plurality of probability values; and weighting each data item embedded vector in the plurality of data item embedded vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of context semantic feature vectors.
Compared with the prior art, the data safety protection integrated system and the method thereof provided by the application adopt an artificial intelligent detection algorithm based on deep learning to extract the characteristic information of the data to be detected and the sensitive vocabulary, and further calculate the transfer matrix between the characteristics of the data to be detected and the characteristics of the sensitive vocabulary to represent the characteristic similarity of the two, so as to judge whether the data is the sensitive data. Thus, a large amount of data can be automatically processed, and the accuracy and efficiency of sensitive data identification are improved.
Drawings
The foregoing and other objects, features and advantages of the present application will become more apparent from the following more particular description of embodiments of the present application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 is a block diagram of a data security integrated system according to an embodiment of the present application.
Fig. 2 is a schematic architecture diagram of a data security protection integrated system according to an embodiment of the present application.
Fig. 3 is a block diagram of a data semantic understanding module to be detected in the data security protection integrated system according to an embodiment of the present application.
Fig. 4 is a block diagram of a context encoding unit in a data security integrated system according to an embodiment of the present application.
Fig. 5 is a block diagram of a sensitive vocabulary semantic understanding module in a data security integrated system according to an embodiment of the present application.
Fig. 6 is a flowchart of a data security protection integration method according to an embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application and not all of the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
Fig. 1 is a block diagram of a data security integrated system according to an embodiment of the present application. As shown in fig. 1, a data security integrated system 100 according to an embodiment of the present application includes: the data acquisition module 110 is configured to acquire data to be detected and a sensitive vocabulary set; the to-be-detected data semantic understanding module 120 is configured to pass the to-be-detected data through a context encoder including an embedded layer to obtain a plurality of context semantic feature vectors; a first scale perception module 130, configured to concatenate the plurality of context semantic feature vectors to obtain a first scale semantic association feature vector; the second scale perception module 140 is configured to two-dimensionally arrange the plurality of context semantic feature vectors into a context semantic feature matrix, and then obtain a second scale semantic association feature vector through a convolutional neural network model including a plurality of hybrid convolutional layers; the multi-scale fusion module 150 is configured to perform interpolation order fusion on the first-scale semantic association feature vector and the second-scale semantic association feature vector to obtain a data feature vector to be detected; a sensitive vocabulary semantic understanding module 160, configured to pass the sensitive vocabulary set through a context encoder including an embedded layer to obtain a sensitive data feature vector; a transfer calculation module 170, configured to calculate a transfer matrix between the feature vector of the data to be detected and the feature vector of the sensitive data as a classification feature matrix; the detection result generating module 180 is configured to pass the classification feature matrix through a classifier to obtain a classification result, where the classification result is used to indicate whether the detection data is sensitive data.
Fig. 2 is a schematic architecture diagram of a data security protection integrated system according to an embodiment of the present application. As shown in fig. 2, first, data to be detected and a sensitive vocabulary set are acquired. The data to be detected is then passed through a context encoder comprising an embedded layer to obtain a plurality of context semantic feature vectors. Then, the context semantic feature vectors are concatenated to obtain a first scale semantic association feature vector. And simultaneously, two-dimensionally arranging the context semantic feature vectors into a context semantic feature matrix, and then obtaining a second-scale semantic association feature vector through a convolutional neural network model comprising a plurality of mixed convolutional layers. And then, carrying out interpolation ordered fusion on the first-scale semantic association feature vector and the second-scale semantic association feature vector to obtain a data feature vector to be detected. And secondly, passing the sensitive vocabulary set through a context encoder comprising an embedded layer to obtain sensitive data feature vectors. Then, a transfer matrix between the data feature vector to be detected and the sensitive data feature vector is calculated as a classification feature matrix. And finally, the classification feature matrix passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the detection data are sensitive data or not.
In the above data security protection integrated system 100, the data collection module 110 is configured to obtain data to be detected and a sensitive vocabulary set. As mentioned above in the background, sensitive data identification is of great importance in data security protection, the main purpose of which is to identify and flag sensitive data stored in a system in order to take corresponding security measures against access, leakage or abuse by unauthorized persons. However, most of the current data have the characteristics of large capacity and complexity, the traditional method for manually combing is low in speed, and different people can judge the same data differently, so that the results generated by the sensitive data in recognition are different. Therefore, an efficient and accurate sensitive data identification scheme is desired.
Accordingly, in the process of identifying the sensitive data, in order to achieve both accuracy and effectiveness, the identification can be performed by feature similarity between the implicit features of the data to be detected and the implicit features of the sensitive vocabulary in a high-dimensional space. In other words, in the technical scheme of the application, an artificial intelligent detection algorithm based on deep learning is adopted to extract feature information of data to be detected and sensitive words, and then a transfer matrix between the features of the data to be detected and the features of the sensitive words is further calculated to represent feature similarity of the two, so that whether the data are sensitive data is judged. Thus, a large amount of data can be automatically processed, and the accuracy and efficiency of sensitive data identification are improved. Specifically, in the technical scheme of the application, first, data to be detected and a sensitive vocabulary set are acquired.
In the above-mentioned integrated data security system 100, the to-be-detected data semantic understanding module 120 is configured to pass the to-be-detected data through a context encoder including an embedded layer to obtain a plurality of context semantic feature vectors. In sensitive data identification, the context information of the data is critical to accurately determining whether the data is sensitive. Thus, to capture context information and semantically related features in the data to be detected, the data to be detected is processed through a context encoder comprising an embedded layer. It should be appreciated that a context encoder is a model for converting text data into a continuous vector representation. Firstly, mapping each word or character in text data into an embedded vector representation through an embedded layer; and then, a self-attention mechanism is introduced into a plurality of embedded vectors by using a converter of the context encoder, the plurality of embedded vectors are converted into a plurality of context semantic feature vectors, and the semantic association features of the data are comprehensively captured, so that the accuracy of sensitive data identification is improved.
Fig. 3 is a block diagram of a data semantic understanding module to be detected in the data security protection integrated system according to an embodiment of the present application. As shown in fig. 3, the to-be-detected data semantic understanding module 120 includes: an embedding unit 121, configured to map text data of each data item in the data to be detected into word embedding vectors by using an embedding layer of the context encoder; a data adding unit 122, configured to add numerical data in each data item to the tail of the word embedding vector of each data item to obtain a plurality of data item embedding vectors; a context encoding unit 123, configured to perform context semantic encoding on the plurality of data item embedded vectors using a Bert model based on a converter of the context encoder to obtain the plurality of context semantic feature vectors.
Fig. 4 is a block diagram of a context encoding unit in a data security integrated system according to an embodiment of the present application. The context encoding unit 123 includes: a one-dimensional arrangement subunit 1231, configured to perform one-dimensional arrangement on the plurality of data item embedding vectors to obtain a data item global embedding vector; a self-attention generation subunit 1232 configured to calculate a product between the global data item embedding vector and a transpose vector of each of the plurality of data item embedding vectors to obtain a plurality of self-attention correlation matrices; a normalized self-attention subunit 1233, configured to perform normalization processing on each of the plurality of self-attention correlation matrices to obtain a plurality of normalized self-attention correlation matrices; a weight generating subunit 1234, configured to obtain a plurality of probability values from each normalized self-attention correlation matrix in the plurality of normalized self-attention correlation matrices by using a classification function; and the weighting subunit 1235 is configured to weight each of the plurality of data item embedding vectors with each of the plurality of probability values as a weight to obtain the plurality of context semantic feature vectors.
In the above data security protection integrated system 100, the first scale perception module 130 is configured to concatenate the plurality of context semantic feature vectors to obtain a first scale semantic association feature vector. In order to comprehensively consider semantic information in the context semantic feature vectors and capture the semantic association features of the data to be detected in a global mode, the context semantic feature vectors need to be fused. The plurality of context semantic feature vectors are integrated by adopting cascading operation to form a global feature representation, so that the whole semantic association features of the data to be detected are reflected better.
In the above data security protection integrated system 100, the second scale perception module 140 is configured to two-dimensionally arrange the plurality of context semantic feature vectors into a context semantic feature matrix, and then obtain a second scale semantic association feature vector through a convolutional neural network model including a plurality of hybrid convolutional layers. To further extract and capture local and global semantic association features of the data, the plurality of contextual semantic feature vectors are further processed using a convolutional data network model. It should be understood that the plurality of context semantic feature vectors are arranged into the form of the context semantic feature matrix, so that different context semantic information can be organized and represented in two-dimensional space, thereby facilitating the better understanding of the relationship between contexts by the model and extracting the local semantic association features. Meanwhile, the convolutional neural network has good feature extraction capability in terms of image and text processing, and the hybrid convolutional layer can capture the association between local detail and global context at the same time. Therefore, by applying convolution operation on the context semantic feature matrix, semantic association features of different scales can be effectively extracted from the context semantic feature matrix.
Accordingly, in one specific example, the second scale awareness module 140 is configured to: each mixed convolution layer using the convolutional neural network model performs respective processing on input data in forward transfer of the layer: performing convolutional encoding on the context semantic feature matrix by using a first convolutional check with a first size to obtain a first scale feature map; performing convolutional encoding on the context semantic feature matrix by using a second convolutional check with the first void ratio to obtain a second scale feature map; performing convolutional encoding on the context semantic feature matrix by using a third convolutional check with a second void ratio to obtain a third scale feature map; performing convolutional encoding on the context semantic feature matrix by using a fourth convolution kernel with a third void fraction to obtain a fourth scale feature map, wherein the first convolution kernel, the second convolution kernel, the third convolution kernel and the fourth convolution kernel have the same size, and the second convolution kernel, the third convolution kernel and the fourth convolution kernel have different void fractions; performing aggregation on the first scale feature map, the second scale feature map, the third scale feature map and the fourth scale feature map along a channel dimension to obtain an aggregated feature map; global pooling processing is carried out on each feature matrix along the channel dimension on the aggregate feature map so as to generate a pooled feature map; performing activation processing on the pooled feature map to generate an activated feature map; wherein the output of the last layer of the convolutional neural network model comprising a plurality of mixed convolutional layers is the second-scale semantically-related feature vector.
In the above data security protection integrated system 100, the multi-scale fusion module 150 is configured to perform interpolation and order fusion on the first-scale semantic association feature vector and the second-scale semantic association feature vector to obtain a data feature vector to be detected. In order to comprehensively utilize semantic association features of different scales, the first-scale semantic association feature vector and the second-scale semantic association feature vector are further fused. It should be understood that the first-scale semantic association feature vector focuses more on the overall semantic association feature of the data, and the second-scale semantic association feature vector focuses more on the local and global semantic association, and by fusing the features of the two scales, different features and information of the two scales can be comprehensively utilized, so that the data to be detected can be more comprehensively described and represented, and accuracy and robustness of sensitive data identification are improved.
In particular, in the technical solution of the present application, the first-scale semantic association feature vector and the second-scale semantic association feature vector respectively represent different high-dimensional feature manifolds in a high-dimensional feature space, but in a class probability tag domain, the first-scale semantic association feature vector and the second-scale semantic association feature vector respectively point to the same class probability tag, so that the high-dimensional feature manifolds of the first-scale semantic association feature vector and the second-scale semantic association feature vector have implicit association in a manifold expression level, that is, in the technical solution of the present application, the high-dimensional feature manifolds of the first-scale semantic association feature vector and the second-scale semantic association feature vector have smoothness and robustness in a manifold characterization level. Based on the above, in the technical solution of the present application, the manifold difference and manifold superposition expansion of the first scale semantic association feature vector and the second scale semantic association feature vector in the high-dimensional feature space are represented by position-by-position difference and position-by-position point addition, so as to use the method Cosine similarity between the differential feature vector and the point-plus-feature vector represents smoothness and robustness of the high-dimensional manifolds of the first-scale semantically-related feature vector and the second-scale semantically-related feature vector between manifold characterization levels. And further, taking cosine similarity between the differential feature vector and the point-added feature vector as a weight parameter, and fusing the first-scale semantic association feature vector and the second-scale semantic association feature vector to obtain the data feature vector to be detected according to the following formula: v (V) i =αV 1 +(1-α)V 2 ,V 1 Representing the first scale semantically associated feature vector, V 2 Representing the second scale semantically-related feature vector, alpha representing a weight parameter, V i Representing the feature vector of the data to be detected. In this way, the high-dimensional feature manifold of the data feature vector to be detected has collinearity with the first-scale semantic association feature vector and the second-scale semantic association feature vector at the geometric level, but the manifold range is different from the manifold measure, and manifold transformation consistency exists at the algebraic angle, so that the data feature vector to be detected can perform feature fusion by utilizing the high-dimensional implicit association between the first-scale semantic association feature vector and the second-scale semantic association feature vector so as to improve the smoothness and robustness of the fused data feature vector to be detected, and further provide the accuracy of classification judgment of a final classification feature matrix through a classifier.
Accordingly, in one specific example, the multi-scale fusion module 150 includes: the difference calculation unit is used for calculating the position-based difference between the first-scale semantic association feature vector and the second-scale semantic association feature vector to obtain a difference feature vector; the per-position weighting unit is used for calculating per-position weighting between the first-scale semantic association feature vector and the second-scale semantic association feature vector to obtain a point-added feature vector; the cosine similarity calculation unit is used for calculating cosine similarity between the differential feature vector and the point-added feature vector; a weighted fusion unit for using the difference characteristicCosine similarity between the vector and the point-added feature vector is used as a weight parameter, and the first-scale semantic association feature vector and the second-scale semantic association feature vector are fused by the following fusion formula to obtain the data feature vector to be detected; wherein, the fusion formula is: v (V) i =αV 1 +(1-α)V 2 ,V 1 Representing the first scale semantically associated feature vector, V 2 Representing the second scale semantically-related feature vector, alpha representing a weight parameter, V i Representing the feature vector of the data to be detected.
In the above-mentioned data security protection integrated system 100, the sensitive vocabulary semantic understanding module 160 is configured to pass the sensitive vocabulary set through a context encoder including an embedded layer to obtain a sensitive data feature vector. It is contemplated that the sensitive vocabulary sets are typically presented in text form. However, sensitive words in text form are inconvenient to directly compare and calculate with the data to be detected. Therefore, it is necessary to use a context encoder to mine the contextual semantic features of the sensitive vocabulary and to convert the set of sensitive vocabulary into a computable and comparable vector representation for feature comparison and classification with the data to be detected.
Fig. 5 is a block diagram of a sensitive vocabulary semantic understanding module in a data security integrated system according to an embodiment of the present application. As shown in fig. 5, the sensitive vocabulary semantic understanding module 160 includes: an embedding vectorization unit 161, configured to map each sensitive vocabulary in the sensitive vocabulary set into a word embedding vector by using an embedding layer of the context encoder to obtain a sequence of word embedding vectors; a semantic coding unit 162, configured to perform global-based context semantic coding on the sequence of word embedding vectors using a Bert model based on a converter of the context encoder to obtain a plurality of word feature vectors; a concatenation unit 163, configured to concatenate the plurality of word feature vectors to obtain the sensitive data feature vector.
In the above data security protection integrated system 100, the transfer calculation module 170 is configured to calculate a transfer matrix between the feature vector of the data to be detected and the feature vector of the sensitive data as a classification feature matrix. In order to capture semantic associations and transfer features between data to be detected and sensitive data, a transfer matrix between the data feature vectors to be detected and the sensitive data feature vectors is further calculated. It should be appreciated that the transfer matrix may be regarded as a transformation matrix mapping the data feature vectors to be detected to sensitive data feature vectors. Each element in the matrix represents a transfer relationship between a certain dimension of the feature vector of the data to be detected and a corresponding dimension of the feature vector of the sensitive data, reflecting the similarity, the difference and the semantic transfer degree between the data to be detected and the sensitive data. Based on the characteristic information, a classifier is further used for judging whether the data to be detected belongs to the sensitive data category.
Accordingly, in one specific example, the transfer calculation module 170 is configured to: calculating a transfer matrix between the data feature vector to be detected and the sensitive data feature vector according to the following transfer formula;
Wherein, the transfer formula is:
wherein V is a Representing the feature vector of the data to be detected, V b Representing the sensitive data feature vector, M representing the transfer matrix,the representation matrix is multiplied by the vector.
In the above data safety protection integrated system 100, the detection result generating module 180 is configured to pass the classification feature matrix through a classifier to obtain a classification result, where the classification result is used to indicate whether the detection data is sensitive data. The classifier is a trained machine learning model, and the training process of the classifier is usually performed based on labeled training data, where the training data includes features of the data to be detected and corresponding class labels (sensitive data or non-sensitive data). Through training, the classifier can learn the association between the features and the categories and finish classifying the unknown data. Here, the classification feature matrix is used as an input, and a classification result for indicating whether the data to be detected is sensitive data can be obtained. Therefore, the automatic classification and the sensitive data identification of the data to be detected are realized, and further processing or decision making is carried out according to the classification result, so that the safety of the data is ensured.
In summary, the data security protection integrated system according to the embodiment of the application is illustrated, an artificial intelligent detection algorithm based on deep learning is adopted to extract feature information of data to be detected and sensitive words, and then a transfer matrix between the features of the data to be detected and the features of the sensitive words is further calculated to represent feature similarity of the two, so that whether the data is sensitive data is judged. Thus, a large amount of data can be automatically processed, and the accuracy and efficiency of sensitive data identification are improved.
Fig. 6 is a flowchart of a data security protection integration method according to an embodiment of the present application. As shown in fig. 6, a data security protection integration method according to an embodiment of the present application includes the steps of: s110, acquiring data to be detected and a sensitive vocabulary set; s120, the data to be detected passes through a context encoder comprising an embedded layer to obtain a plurality of context semantic feature vectors; s130, cascading the context semantic feature vectors to obtain a first-scale semantic association feature vector; s140, two-dimensionally arranging the context semantic feature vectors into a context semantic feature matrix, and then obtaining a second-scale semantic association feature vector through a convolutional neural network model comprising a plurality of mixed convolutional layers; s150, carrying out interpolation ordered fusion on the first-scale semantic association feature vector and the second-scale semantic association feature vector to obtain a data feature vector to be detected; s160, passing the sensitive vocabulary set through a context encoder comprising an embedded layer to obtain a sensitive data feature vector; s170, calculating a transfer matrix between the data feature vector to be detected and the sensitive data feature vector as a classification feature matrix; and S180, the classification feature matrix is passed through a classifier to obtain a classification result, and the classification result is used for indicating whether the detection data is sensitive data or not.
Here, it will be understood by those skilled in the art that the specific operations of the respective steps in the above-described data security integration method have been described in detail in the above description of the data security integration system with reference to fig. 1 to 5, and thus, repetitive descriptions thereof will be omitted.

Claims (10)

1.一种数据安全防护一体化系统,其特征在于,包括:1. An integrated data security protection system, characterized by including: 数据采集模块,用于获取待检测数据和敏感词汇集合;Data collection module, used to obtain data to be detected and sensitive vocabulary collections; 待检测数据语义理解模块,用于将所述待检测数据通过包含嵌入层的上下文编码器以得到多个上下文语义特征向量;A semantic understanding module for the data to be detected, used to pass the data to be detected through a context encoder including an embedding layer to obtain multiple contextual semantic feature vectors; 第一尺度感知模块,用于将所述多个上下文语义特征向量进行级联以得到第一尺度语义关联特征向量;A first scale perception module, configured to concatenate the plurality of contextual semantic feature vectors to obtain a first scale semantic association feature vector; 第二尺度感知模块,用于将所述多个上下文语义特征向量进行二维排列为上下文语义特征矩阵后通过包含多个混合卷积层的卷积神经网络模型以得到第二尺度语义关联特征向量;The second scale perception module is used to two-dimensionally arrange the plurality of contextual semantic feature vectors into a contextual semantic feature matrix and then use a convolutional neural network model including multiple hybrid convolutional layers to obtain the second scale semantic association feature vector. ; 多尺度融合模块,用于对所述第一尺度语义关联特征向量和所述第二尺度语义关联特征向量进行插值秩序化融合以得到待检测数据特征向量;A multi-scale fusion module, configured to interpolate and orderly fuse the first-scale semantic correlation feature vector and the second-scale semantic correlation feature vector to obtain a feature vector of the data to be detected; 敏感词汇语义理解模块,用于将所述敏感词汇集合通过包含嵌入层的上下文编码器以得到敏感数据特征向量;A sensitive vocabulary semantic understanding module, used to pass the sensitive vocabulary set through a context encoder including an embedding layer to obtain a sensitive data feature vector; 转移计算模块,用于计算所述待检测数据特征向量与所述敏感数据特征向量之间的转移矩阵作为分类特征矩阵;A transfer calculation module, used to calculate the transfer matrix between the feature vector of the data to be detected and the feature vector of the sensitive data as a classification feature matrix; 检测结果生成模块,用于将所述分类特征矩阵通过分类器以得到分类结果,所述分类结果用于表示该检测数据是否为敏感数据。A detection result generation module is used to pass the classification feature matrix through a classifier to obtain a classification result, where the classification result is used to indicate whether the detection data is sensitive data. 2.根据权利要求1所述的数据安全防护一体化系统,其特征在于,所述待检测数据语义理解模块,包括:2. The data security protection integrated system according to claim 1, characterized in that the semantic understanding module of the data to be detected includes: 嵌入化单元,用于使用所述上下文编码器的嵌入层将所述待检测数据中的各个数据项的文本数据分别映射为词嵌入向量;An embedding unit, configured to use the embedding layer of the context encoder to map the text data of each data item in the data to be detected into word embedding vectors; 数据添加单元,用于将所述各个数据项中的数值数据分别添加到所述各个数据项的词嵌入向量的尾部以得到多个数据项嵌入向量;A data adding unit, configured to add the numerical data in each data item to the end of the word embedding vector of each data item to obtain multiple data item embedding vectors; 上下文编码单元,用于使用所述上下文编码器的基于转换器的Bert模型对所述多个数据项嵌入向量进行上下文语义编码以得到所述多个上下文语义特征向量。A context encoding unit, configured to use the transformer-based Bert model of the context encoder to perform contextual semantic encoding on the plurality of data item embedding vectors to obtain the plurality of contextual semantic feature vectors. 3.根据权利要求2所述的数据安全防护一体化系统,其特征在于,所述上下文编码单元,包括:3. The integrated data security protection system according to claim 2, characterized in that the context encoding unit includes: 一维排列子单元,用于将所述多个数据项嵌入向量进行一维排列以得到数据项全局嵌入向量;A one-dimensional arrangement subunit, used to arrange the plurality of data item embedding vectors in one dimension to obtain a data item global embedding vector; 自注意力生成子单元,用于计算所述数据项全局嵌入向量与所述多个数据项嵌入向量中各个数据项嵌入向量的转置向量之间的乘积以得到多个自注意力关联矩阵;The self-attention generation subunit is used to calculate the product between the global embedding vector of the data item and the transposed vector of each data item embedding vector in the plurality of data item embedding vectors to obtain multiple self-attention correlation matrices; 标准化自注意力子单元,用于分别对所述多个自注意力关联矩阵中各个自注意力关联矩阵进行标准化处理以得到多个标准化后自注意力关联矩阵;A standardized self-attention subunit, used to perform standardization processing on each of the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; 权重生成子单元,用于将所述多个标准化后自注意力关联矩阵中各个标准化后自注意力关联矩阵通过分类函数以得到多个概率值;A weight generation subunit, used to pass each of the standardized self-attention correlation matrices among the multiple standardized self-attention correlation matrices through a classification function to obtain multiple probability values; 加权子单元,用于分别以所述多个概率值中各个概率值作为权重对所述多个数据项嵌入向量中各个数据项嵌入向量进行加权以得到所述多个上下文语义特征向量。A weighting subunit, configured to weight each of the plurality of data item embedding vectors using each of the plurality of probability values as a weight to obtain the plurality of contextual semantic feature vectors. 4.根据权利要求3所述的数据安全防护一体化系统,其特征在于,所述第二尺度感知模块,用于:使用所述卷积神经网络模型的各个混合卷积层在层的正向传递中分别对输入数据进行:4. The data security protection integrated system according to claim 3, characterized in that the second scale sensing module is used to: use each hybrid convolution layer of the convolutional neural network model in the forward direction of the layer. The input data are processed separately during transfer: 使用具有第一尺寸的第一卷积核对所述上下文语义特征矩阵进行卷积编码以得到第一尺度特征图;Convolutionally encoding the contextual semantic feature matrix using a first convolution kernel having a first size to obtain a first scale feature map; 使用具有第一空洞率的第二卷积核对所述上下文语义特征矩阵进行卷积编码以得到第二尺度特征图;Convolutionally encode the contextual semantic feature matrix using a second convolution kernel with a first hole rate to obtain a second scale feature map; 使用具有第二空洞率的第三卷积核对所述上下文语义特征矩阵进行卷积编码以得到第三尺度特征图;Convolutionally encode the contextual semantic feature matrix using a third convolution kernel with a second hole rate to obtain a third scale feature map; 使用具有第三空洞率的第四卷积核对所述上下文语义特征矩阵进行卷积编码以得到第四尺度特征图,其中,所述第一卷积核、所述第二卷积核、所述第三卷积核和所述第四卷积核具有相同的尺寸,且所述第二卷积核、所述第三卷积核和所述第四卷积核具有不同的空洞率;The context semantic feature matrix is convolutionally encoded using a fourth convolution kernel with a third hole rate to obtain a fourth scale feature map, wherein the first convolution kernel, the second convolution kernel, the The third convolution kernel and the fourth convolution kernel have the same size, and the second convolution kernel, the third convolution kernel and the fourth convolution kernel have different hole rates; 将所述第一尺度特征图、所述第二尺度特征图、所述第三尺度特征图和所述第四尺度特征图进行沿通道维度的聚合以得到聚合特征图;Aggregate the first scale feature map, the second scale feature map, the third scale feature map and the fourth scale feature map along the channel dimension to obtain an aggregate feature map; 对所述聚合特征图进行沿通道维度的各个特征矩阵的全局池化处理以生成池化特征图;Perform global pooling processing on each feature matrix along the channel dimension on the aggregated feature map to generate a pooled feature map; 对所述池化特征图进行激活处理以生成激活特征图;Perform activation processing on the pooled feature map to generate an activation feature map; 其中,所述包含多个混合卷积层的卷积神经网络模型的最后一层的输出为所述第二尺度语义关联特征向量。Wherein, the output of the last layer of the convolutional neural network model including multiple hybrid convolutional layers is the second scale semantic association feature vector. 5.根据权利要求4所述的数据安全防护一体化系统,其特征在于,所述多尺度融合模块,包括:5. The integrated data security protection system according to claim 4, characterized in that the multi-scale fusion module includes: 差分计算单元,用于计算所述第一尺度语义关联特征向量和所述第二尺度语义关联特征向量之间的按位置差分以得到差分特征向量;A difference calculation unit configured to calculate the position-wise difference between the first-scale semantic correlation feature vector and the second-scale semantic correlation feature vector to obtain a difference feature vector; 按位置加权单元,用于计算所述第一尺度语义关联特征向量和所述第二尺度语义关联特征向量之间的按位置加权以得到点加特征向量;A weighting unit by position, used to calculate weighting by position between the first scale semantic correlation feature vector and the second scale semantic correlation feature vector to obtain a point plus feature vector; 余弦相似度计算单元,用于计算所述差分特征向量和所述点加特征向量之间的余弦相似度;A cosine similarity calculation unit, used to calculate the cosine similarity between the differential feature vector and the point plus feature vector; 加权融合单元,用于以所述差分特征向量和所述点加特征向量之间的余弦相似度作为权重参数,并以如下融合公式来融合所述第一尺度语义关联特征向量和所述第二尺度语义关联特征向量以得到所述待检测数据特征向量;其中,所述融合公式为:Vi=αV1+(1-α)V2,V1表示所述第一尺度语义关联特征向量、V2表示所述第二尺度语义关联特征向量,α表示权重参数,Vi表示所述待检测数据特征向量。A weighted fusion unit, configured to use the cosine similarity between the differential feature vector and the point plus feature vector as a weight parameter, and use the following fusion formula to fuse the first scale semantic association feature vector and the second Scale semantic correlation feature vector to obtain the feature vector of the data to be detected; wherein, the fusion formula is: V i =αV 1 +(1-α)V 2 , V 1 represents the first scale semantic correlation feature vector, V 2 represents the second scale semantic association feature vector, α represents the weight parameter, and Vi represents the feature vector of the data to be detected. 6.根据权利要求5所述的数据安全防护一体化系统,其特征在于,所述敏感词汇语义理解模块,包括:6. The data security protection integrated system according to claim 5, characterized in that the sensitive vocabulary semantic understanding module includes: 嵌入向量化单元,用于使用所述上下文编码器的嵌入层分别将所述敏感词汇集合中各个敏感词汇映射为词嵌入向量以得到词嵌入向量的序列;An embedding vectorization unit, configured to use the embedding layer of the context encoder to respectively map each sensitive word in the sensitive word set to a word embedding vector to obtain a sequence of word embedding vectors; 语义编码单元,用于使用所述上下文编码器的基于转换器的Bert模型对所述词嵌入向量的序列进行基于全局的上下文语义编码以得到多个词特征向量;A semantic encoding unit, configured to use the transformer-based Bert model of the context encoder to perform global contextual semantic encoding on the sequence of word embedding vectors to obtain multiple word feature vectors; 级联单元,用于将所述多个词特征向量进行级联以得到所述敏感数据特征向量。A cascading unit is used to cascade the plurality of word feature vectors to obtain the sensitive data feature vector. 7.根据权利要求6所述的数据安全防护一体化系统,其特征在于,所述转移计算模块,用于:以如下转移公式计算所述待检测数据特征向量与所述敏感数据特征向量之间的转移矩阵;7. The data security protection integrated system according to claim 6, characterized in that the transfer calculation module is used to calculate the relationship between the feature vector of the data to be detected and the feature vector of the sensitive data using the following transfer formula transfer matrix; 其中,所述转移公式为:Among them, the transfer formula is: 其中Va表示所述待检测数据特征向量,Vb表示所述敏感数据特征向量,M表示所述转移矩阵,表示矩阵与向量相乘。Where V a represents the feature vector of the data to be detected, V b represents the feature vector of the sensitive data, and M represents the transfer matrix, Represents the multiplication of matrices and vectors. 8.一种数据安全防护一体化方法,其特征在于,包括:8. An integrated method of data security protection, characterized by including: 获取待检测数据和敏感词汇集合;Obtain the data to be detected and the sensitive vocabulary collection; 将所述待检测数据通过包含嵌入层的上下文编码器以得到多个上下文语义特征向量;Pass the data to be detected through a context encoder including an embedding layer to obtain multiple contextual semantic feature vectors; 将所述多个上下文语义特征向量进行级联以得到第一尺度语义关联特征向量;Concatenate the plurality of contextual semantic feature vectors to obtain a first-scale semantic association feature vector; 将所述多个上下文语义特征向量进行二维排列为上下文语义特征矩阵后通过包含多个混合卷积层的卷积神经网络模型以得到第二尺度语义关联特征向量;The plurality of contextual semantic feature vectors are two-dimensionally arranged into a contextual semantic feature matrix and then passed through a convolutional neural network model including multiple hybrid convolutional layers to obtain a second scale semantic association feature vector; 对所述第一尺度语义关联特征向量和所述第二尺度语义关联特征向量进行插值秩序化融合以得到待检测数据特征向量;Perform interpolation and orderly fusion on the first scale semantic correlation feature vector and the second scale semantic correlation feature vector to obtain a feature vector of the data to be detected; 将所述敏感词汇集合通过包含嵌入层的上下文编码器以得到敏感数据特征向量;Pass the sensitive vocabulary set through a context encoder including an embedding layer to obtain a sensitive data feature vector; 计算所述待检测数据特征向量与所述敏感数据特征向量之间的转移矩阵作为分类特征矩阵;Calculate the transfer matrix between the feature vector of the data to be detected and the feature vector of the sensitive data as a classification feature matrix; 将所述分类特征矩阵通过分类器以得到分类结果,所述分类结果用于表示该检测数据是否为敏感数据。The classification feature matrix is passed through a classifier to obtain a classification result, and the classification result is used to indicate whether the detection data is sensitive data. 9.根据权利要求8所述的数据安全防护一体化方法,其特征在于,将所述待检测数据通过包含嵌入层的上下文编码器以得到多个上下文语义特征向量,包括:9. The data security protection integrated method according to claim 8, characterized in that the data to be detected is passed through a context encoder including an embedding layer to obtain a plurality of contextual semantic feature vectors, including: 使用所述上下文编码器的嵌入层将所述待检测数据中的各个数据项的文本数据分别映射为词嵌入向量;Use the embedding layer of the context encoder to map the text data of each data item in the data to be detected into word embedding vectors; 将所述各个数据项中的数值数据分别添加到所述各个数据项的词嵌入向量的尾部以得到多个数据项嵌入向量;Add the numerical data in each data item to the tail of the word embedding vector of each data item to obtain multiple data item embedding vectors; 使用所述上下文编码器的基于转换器的Bert模型对所述多个数据项嵌入向量进行上下文语义编码以得到所述多个上下文语义特征向量。The plurality of data item embedding vectors are contextually semantically encoded using the transformer-based Bert model of the context encoder to obtain the plurality of contextual semantic feature vectors. 10.根据权利要求9所述的数据安全防护一体化方法,其特征在于,使用所述上下文编码器的基于转换器的Bert模型对所述多个数据项嵌入向量进行上下文语义编码以得到所述多个上下文语义特征向量,包括:10. The data security protection integrated method according to claim 9, characterized in that the Bert model based on the transformer of the context encoder is used to perform context semantic encoding on the plurality of data item embedding vectors to obtain the Multiple contextual semantic feature vectors, including: 将所述多个数据项嵌入向量进行一维排列以得到数据项全局嵌入向量;One-dimensionally arrange the plurality of data item embedding vectors to obtain a data item global embedding vector; 计算所述数据项全局嵌入向量与所述多个数据项嵌入向量中各个数据项嵌入向量的转置向量之间的乘积以得到多个自注意力关联矩阵;Calculating a product between the data item global embedding vector and the transposed vector of each data item embedding vector in the plurality of data item embedding vectors to obtain a plurality of self-attention correlation matrices; 分别对所述多个自注意力关联矩阵中各个自注意力关联矩阵进行标准化处理以得到多个标准化后自注意力关联矩阵;Perform standardization processing on each of the self-attention correlation matrices respectively to obtain a plurality of standardized self-attention correlation matrices; 将所述多个标准化后自注意力关联矩阵中各个标准化后自注意力关联矩阵通过分类函数以得到多个概率值;Pass each of the multiple standardized self-attention correlation matrices through a classification function to obtain multiple probability values; 分别以所述多个概率值中各个概率值作为权重对所述多个数据项嵌入向量中各个数据项嵌入向量进行加权以得到所述多个上下文语义特征向量。Each of the plurality of data item embedding vectors is weighted using each of the plurality of probability values as a weight to obtain the plurality of contextual semantic feature vectors.
CN202311376336.3A 2023-10-23 2023-10-23 Data security protection integrated system and method Withdrawn CN117474006A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311376336.3A CN117474006A (en) 2023-10-23 2023-10-23 Data security protection integrated system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311376336.3A CN117474006A (en) 2023-10-23 2023-10-23 Data security protection integrated system and method

Publications (1)

Publication Number Publication Date
CN117474006A true CN117474006A (en) 2024-01-30

Family

ID=89635650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311376336.3A Withdrawn CN117474006A (en) 2023-10-23 2023-10-23 Data security protection integrated system and method

Country Status (1)

Country Link
CN (1) CN117474006A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118628793A (en) * 2024-05-21 2024-09-10 东阳市铭品日用品有限公司 Grain and oil quality management system and method thereof
WO2025245793A1 (en) * 2024-05-30 2025-12-04 福建中信网安信息科技有限公司 Integrated system and method for data security protection

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11194972B1 (en) * 2021-02-19 2021-12-07 Institute Of Automation, Chinese Academy Of Sciences Semantic sentiment analysis method fusing in-depth features and time sequence models
WO2022134759A1 (en) * 2020-12-21 2022-06-30 深圳壹账通智能科技有限公司 Keyword generation method and apparatus, and electronic device and computer storage medium
CN115456789A (en) * 2022-11-10 2022-12-09 杭州衡泰技术股份有限公司 Abnormal transaction detection method and system based on transaction pattern recognition
CN116524603A (en) * 2023-05-15 2023-08-01 开元华创科技(集团)有限公司 Electronic signature method, system and storage medium for informationized detection laboratory achievements
CN116595551A (en) * 2023-05-15 2023-08-15 广州微明信息科技有限公司 Bank transaction data management method and system
CN116702156A (en) * 2023-06-20 2023-09-05 安徽百方云科技有限公司 Information security risk evaluation system and method thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022134759A1 (en) * 2020-12-21 2022-06-30 深圳壹账通智能科技有限公司 Keyword generation method and apparatus, and electronic device and computer storage medium
US11194972B1 (en) * 2021-02-19 2021-12-07 Institute Of Automation, Chinese Academy Of Sciences Semantic sentiment analysis method fusing in-depth features and time sequence models
CN115456789A (en) * 2022-11-10 2022-12-09 杭州衡泰技术股份有限公司 Abnormal transaction detection method and system based on transaction pattern recognition
CN116524603A (en) * 2023-05-15 2023-08-01 开元华创科技(集团)有限公司 Electronic signature method, system and storage medium for informationized detection laboratory achievements
CN116595551A (en) * 2023-05-15 2023-08-15 广州微明信息科技有限公司 Bank transaction data management method and system
CN116702156A (en) * 2023-06-20 2023-09-05 安徽百方云科技有限公司 Information security risk evaluation system and method thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118628793A (en) * 2024-05-21 2024-09-10 东阳市铭品日用品有限公司 Grain and oil quality management system and method thereof
WO2025245793A1 (en) * 2024-05-30 2025-12-04 福建中信网安信息科技有限公司 Integrated system and method for data security protection

Similar Documents

Publication Publication Date Title
CN116309580B (en) Corrosion detection method for oil and gas pipelines based on magnetic stress
CN113221567A (en) Judicial domain named entity and relationship combined extraction method
CN107798033B (en) A classification method of case texts in the field of public security
CN110830489B (en) Method and system for detecting counterattack type fraud website based on content abstract representation
CN117474006A (en) Data security protection integrated system and method
CN113032525A (en) False news detection method and device, electronic equipment and storage medium
CN111985538A (en) Small sample picture classification model and method based on semantic auxiliary attention mechanism
CN116722992A (en) Fraud website identification method and device based on multi-mode fusion
CN110675269A (en) Text auditing method and device
CN117516937A (en) Unknown fault detection method of rolling bearing based on multi-modal feature fusion enhancement
CN118154988B (en) Automatic monitoring system and method for infringing and counterfeit goods
CN115758159A (en) Zero sample text position detection method based on mixed contrast learning and generation type data enhancement
CN116595551A (en) Bank transaction data management method and system
CN119829766B (en) Telecommunication fraud text classification detection method based on RoBERTa model
CN119761373A (en) A method and system for identifying sensitive information for file review and opening
CN113326703B (en) Emotion recognition method and system based on multi-modal confrontation fusion in heterogeneous space
CN111815108A (en) An evaluation method for power grid engineering design change and on-site visa approval form
CN118689990A (en) Cross-modal image-text retrieval method and system based on semantic association similarity learning
CN109558591A (en) Chinese event detection method and device
Wu et al. Text classification using triplet capsule networks
CN111523301B (en) Contract document compliance checking method and device
CN120145098A (en) A market violation detection method based on multimodal image and text fusion
CN119380163A (en) Fake news detection method based on comment-context dual collaborative masked Transformer model
WO2025245793A1 (en) Integrated system and method for data security protection
CN114911971B (en) Method and device for detecting fake person videos by integrating title information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20240130

WW01 Invention patent application withdrawn after publication