[go: up one dir, main page]

CN115713450A - Table data watermarking method for resisting column deletion attack - Google Patents

Table data watermarking method for resisting column deletion attack Download PDF

Info

Publication number
CN115713450A
CN115713450A CN202211331263.1A CN202211331263A CN115713450A CN 115713450 A CN115713450 A CN 115713450A CN 202211331263 A CN202211331263 A CN 202211331263A CN 115713450 A CN115713450 A CN 115713450A
Authority
CN
China
Prior art keywords
data
watermark
row
column
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211331263.1A
Other languages
Chinese (zh)
Inventor
罗森林
杨宗源
潘丽敏
魏继勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202211331263.1A priority Critical patent/CN115713450A/en
Publication of CN115713450A publication Critical patent/CN115713450A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Editing Of Facsimile Originals (AREA)

Abstract

The invention relates to a table data watermarking method for resisting column deletion attack, and belongs to the technical field of computers and information science. Firstly, determining watermark column identifications by combining attribute importance degrees and data distortion tolerance degrees; then, a characteristic repairing classification model is constructed by combining the clustering label and the damaged line data, the model is used for classifying the original data, and the watermark line identification is determined according to the class probability; then, determining a watermark embedding position and embedding watermark information through the row and column identification; and finally, determining a watermark row identifier by using a characteristic repairing classification model in a watermark detection stage, and extracting watermark information by combining with the watermark column identifier. Aiming at the problem that the column deletion resisting attack capability of the existing table data watermarking method is insufficient, the method constructs the characteristic repairing classification model to accurately obtain the line identification of the attacked data, and effectively improves the watermark detection accuracy.

Description

抗列删除攻击的表格数据水印方法Table data watermarking method against column delete attack

技术领域technical field

本发明涉及抗列删除攻击的表格数据水印方法,属于计算机与信息科学技术领域。The invention relates to a table data watermarking method against column deletion attack, which belongs to the technical field of computer and information science.

背景技术Background technique

表格数据是医疗诊断、金融决策、工业智能等行业领域的重要数据资源,一旦被窃取滥用,将极大侵害所有者权益。表格数据水印技术是对表格数据进行版权保护和追踪溯源的有效方法,研究表格数据水印技术对于数字资产的安全保护具有重要意义。Tabular data is an important data resource in industries such as medical diagnosis, financial decision-making, and industrial intelligence. Once stolen and misused, it will greatly infringe on the rights and interests of owners. Table data watermarking technology is an effective method for copyright protection and traceability of table data. Researching table data watermarking technology is of great significance for the security protection of digital assets.

当前表格数据水印方法主要可分为三类:The current tabular data watermarking methods can be mainly divided into three categories:

1.唯一主键方法1. Unique primary key method

唯一主键方法是主流水印方法应用的基础。方法使用Hash计算秘钥和主键的散列值以确定水印位置,不同秘钥下水印位置不同,确保非法用户不能获得水印信息。但唯一主键方法应用的前提是表格数据存在唯一主键,若表格数据无主键或主键被删改,水印将无法被检测识别。The unique primary key method is the basis for the application of mainstream watermarking methods. The method uses Hash to calculate the hash value of the secret key and the primary key to determine the watermark position. The watermark position is different under different secret keys to ensure that illegal users cannot obtain watermark information. However, the premise of the application of the unique primary key method is that there is a unique primary key in the table data. If the table data has no primary key or the primary key is deleted, the watermark will not be detected and identified.

2.虚拟主键方法2. Virtual primary key method

虚拟主键方法将连续属性值转化为二进制后进行高低位分割,高位使用Hash计算生成虚拟主键,低位进行水印嵌入,从而避免唯一主键方法的缺陷。但虚拟主键方法对所选的连续属性值要求较高,当数据被篡改时将导致水印失效,且该方法无法使用离散属性值生成虚拟主键,难以充分利用数据资源。The virtual primary key method converts the continuous attribute value into binary and performs high and low bit segmentation. The high bit uses Hash calculation to generate a virtual primary key, and the low bit performs watermark embedding, so as to avoid the defects of the unique primary key method. However, the virtual primary key method has high requirements on the selected continuous attribute values. When the data is tampered, the watermark will become invalid. Moreover, this method cannot use discrete attribute values to generate virtual primary keys, making it difficult to make full use of data resources.

3.聚类分组方法3. Clustering grouping method

聚类分组方法不再计算Hash散列值,而是基于距离度量直接实现聚类分组,并且可以同时使用连续或离散属性值,相较于虚拟主键方法具有更强的算法安全性。但聚类分组方法同样依赖参与聚类属性值的完整性。若聚类属性值被删除,标识与属性值间的单项映射关联被破坏,将导致水印检测时标识计算错误,水印无法被正确识别。The clustering grouping method no longer calculates the Hash hash value, but directly realizes the clustering grouping based on the distance measure, and can use continuous or discrete attribute values at the same time, which has stronger algorithm security than the virtual primary key method. But cluster grouping methods also rely on the integrity of the attribute values participating in the cluster. If the cluster attribute value is deleted, the single-item mapping association between the identifier and the attribute value will be destroyed, which will lead to an error in the calculation of the identifier during watermark detection, and the watermark cannot be correctly identified.

综上所述,现有表格数据水印方法过于依赖主键或所选取的属性值,抗列删除攻击能力不足,所以本发明提出抗列删除攻击的表格数据水印方法。To sum up, the existing table data watermarking method relies too much on the primary key or the selected attribute value, and has insufficient ability to resist column deletion attacks. Therefore, the present invention proposes a table data watermarking method that is resistant to column deletion attacks.

发明内容Contents of the invention

本发明的目的是针对表格数据水印方法抗列删除攻击能力不足的问题,提出了抗列删除攻击的表格数据水印方法。The object of the present invention is to propose a table data watermarking method resistant to column deletion attacks in view of the problem that the table data watermarking method has insufficient ability to resist column deletion attacks.

本发明的设计原理为:首先选取重要属性列作为水印列标识;其次使用聚类方法获得行数据聚类标签,构造受损行数据,结合聚类标签和受损行数据构建特征修复分类模型,利用模型对原始数据进行分类并根据类别概率确定水印行标识;然后使用纠错码编码水印信息,根据行标识和列标识确定嵌入位置并冗余嵌入水印信息,获得含水印数据;最后使用特征修复分类模型确定水印位置,提取水印信息并解码,获得嵌入的水印信息。The design principle of the present invention is: first select important attribute columns as the watermark column identification; secondly use the clustering method to obtain row data cluster labels, construct damaged row data, and construct a feature repair classification model by combining the cluster labels and damaged row data, Use the model to classify the original data and determine the watermark row identifier according to the category probability; then use the error correction code to encode the watermark information, determine the embedding position according to the row identifier and column identifier and redundantly embed the watermark information to obtain the watermarked data; finally use the feature restoration The classification model determines the position of the watermark, extracts and decodes the watermark information, and obtains the embedded watermark information.

本发明的技术方案是通过如下步骤实现的:Technical scheme of the present invention is realized through the following steps:

步骤1,结合属性重要程度及数据失真容忍度选取重要连续变量属性列,确定水印列标识。Step 1. Select important continuous variable attribute columns in combination with attribute importance and data distortion tolerance, and determine the watermark column identification.

步骤2,构建特征修复分类网络模型确定水印行标识。Step 2, constructing a feature restoration classification network model to determine the watermark row identifier.

步骤2.1,使用过滤式特征选择法选取聚类特征。In step 2.1, cluster features are selected using filter feature selection method.

步骤2.2,基于所选特征使用约束FCM算法进行无监督聚类,获得行数据聚类标签。In step 2.2, unsupervised clustering is performed using the constrained FCM algorithm based on the selected features, and the row data cluster labels are obtained.

步骤2.3,使用掩码向量生成受损行数据,并利用聚类标签和受损行数据训练特征修复分类网络模型。In step 2.3, use mask vectors to generate damaged row data, and use clustering labels and damaged row data to train feature repair classification network models.

步骤2.4,使用模型计算各行数据分类类别概率,根据类别概率为原始行数据添加分组标识并选取行数据作为水印行标识。In step 2.4, use the model to calculate the category probability of each row of data, add a group identifier to the original row data according to the category probability, and select the row data as the watermark row identifier.

步骤3,将水印信息冗余嵌入原始数据。Step 3, redundantly embed the watermark information into the original data.

步骤3.1,将水印信息编码为二进制格式,并添加纠错码。Step 3.1, encode the watermark information into a binary format, and add an error correction code.

步骤3.2,根据水印行标识和水印列标识确定水印嵌入位置,使用LSB算法冗余嵌入水印编码。In step 3.2, the watermark embedding position is determined according to the watermark row identifier and the watermark column identifier, and the LSB algorithm is used to redundantly embed the watermark code.

步骤4,对含水印数据进行水印检测。Step 4, perform watermark detection on the watermarked data.

步骤4.1,使用特征修复分类网络获得水印行标识,结合水印列标识确定水印嵌入位置。Step 4.1, use the feature repair classification network to obtain the watermark row identification, and combine the watermark column identification to determine the watermark embedding position.

步骤4.2,提取水印编码并解码,恢复水印信息。In step 4.2, the watermark code is extracted and decoded to recover the watermark information.

有益效果Beneficial effect

相比于唯一主键方法,本发明可以在无主键的数据中嵌入水印。Compared with the unique primary key method, the present invention can embed the watermark in the data without primary key.

相比于虚拟主键法,本发明通过无监督聚类方法选取水印行标识,可同时使用连续属性值和离散属性值,可充分利用数据资源。Compared with the virtual primary key method, the present invention selects watermark row identifiers through an unsupervised clustering method, can use continuous attribute values and discrete attribute values at the same time, and can make full use of data resources.

相比于聚类分组法,本发明通过建立特征修复分类模型,利用特征修复编码实现受损数据的正确分类,同时根据分类网络输出的类别概率选取行数据嵌入冗余信息,减少数据统计特征的失真程度。Compared with the clustering and grouping method, the present invention establishes a feature repair classification model, utilizes feature repair codes to realize the correct classification of damaged data, and at the same time selects row data to embed redundant information according to the category probability output by the classification network, reducing data statistical features. The degree of distortion.

附图说明Description of drawings

图1为本发明抗列删除攻击的表格数据水印方法原理图。FIG. 1 is a schematic diagram of the table data watermarking method against column deletion attack of the present invention.

图2为特征修复分类网络结构图。Figure 2 is a structure diagram of the feature restoration classification network.

具体实施方式Detailed ways

为了更好的说明本发明的目的和优点,下面结合实例对本发明方法的实施方式做进一步详细说明。In order to better illustrate the purpose and advantages of the present invention, the implementation of the method of the present invention will be further described in detail below in conjunction with examples.

实验数据来自真实生物信息数据集Checkup。数据水印实验数据见表1。The experimental data comes from the real biological information dataset Checkup. The experimental data of data watermarking are shown in Table 1.

表1.数据水印实验数据集Table 1. Data watermarking experiment dataset

Figure BDA0003913307440000031
Figure BDA0003913307440000031

实验采用行标识准确率Accloc作为评价指标,以评估所用方法在参与标识计算的列属性被删除后,对数据行标识的恢复效果。行标识准确率的计算方法为:The experiment uses the row identification accuracy rate Acc loc as an evaluation index to evaluate the recovery effect of the method used on the data row identification after the column attributes involved in the identification calculation are deleted. The calculation method of row identification accuracy is:

Figure BDA0003913307440000032
Figure BDA0003913307440000032

其中,rj为第j行的表格数据,y为列删除攻击前分组类别,

Figure BDA0003913307440000033
为列删除攻击后分组类别,n为数据的行数量。Among them, r j is the tabular data of the jth row, y is the grouping category before the column deletion attack,
Figure BDA0003913307440000033
The post-attack grouping category is deleted for the column, and n is the number of rows of the data.

本次实验在一台计算机和一台服务器上进行,计算机的具体配置为:Inter i9-9900,RAM 32G,操作系统是windows 11,64位;服务器的具体配置为:GeForce GTX 1080Ti,操作系统是Linux Ubuntu 20.04,64位。This experiment is carried out on a computer and a server. The specific configuration of the computer is: Inter i9-9900, RAM 32G, the operating system is windows 11, 64 bits; the specific configuration of the server is: GeForce GTX 1080Ti, the operating system is Linux Ubuntu 20.04, 64 bit.

本次实验的具体流程为:The specific process of this experiment is:

步骤1,将连续属性值按照方差σ和均值μ进行降序排列,属性列的排序方式T为:Step 1, sort the continuous attribute values in descending order according to the variance σ and mean μ, and the sorting method T of the attribute columns is:

T=lnμ+log10σ,T=lnμ+log 10 σ,

以排列为参考,结合属性重要程度及数据失真容忍度两种主观因素选取属性列作为待嵌入水印的列标识。Taking the arrangement as a reference, combined with the two subjective factors of attribute importance and data distortion tolerance, the attribute column is selected as the column identifier to be embedded in the watermark.

步骤2,构建特征修复分类网络模型并利用模型确定水印行标识。Step 2, constructing a feature restoration classification network model and using the model to determine the watermark row identifier.

步骤2.1,使用过滤式特征选择法计算特征之间相关系数和方差,从高方差特征数据中选取高相关系数的特征作为聚类特征以增加聚类属性冗余度,选取特征数为max{0.8k,ca},其中k为聚类数,ca为连续属性列数量。Step 2.1, use the filtering feature selection method to calculate the correlation coefficient and variance between features, select the features with high correlation coefficients from the high variance feature data as clustering features to increase the redundancy of clustering attributes, and select the number of features as max{0.8 k,ca}, where k is the number of clusters and ca is the number of continuous attribute columns.

步骤2.2,基于聚类特征使用约束FCM算法进行无监督聚类,约束FCM模型训练的目标函数为:Step 2.2, based on the clustering features, use the constrained FCM algorithm for unsupervised clustering. The objective function of constrained FCM model training is:

Figure BDA0003913307440000034
Figure BDA0003913307440000034

其中,ci表示聚类中心,rj表示行数据,

Figure BDA0003913307440000035
表示第j行数据属于第i类的隶属度,并满足各类簇大小相同且各类隶属度之和为1的约束条件。根据聚类结果获得各行数据聚类标签。Among them, c i represents the cluster center, r j represents the row data,
Figure BDA0003913307440000035
Indicates the membership degree of the j-th row of data belonging to the i-th category, and satisfies the constraints that the size of each cluster is the same and the sum of each membership degree is 1. According to the clustering results, the cluster labels of each row of data are obtained.

步骤2.3,使用掩码向量m生成受损行数据

Figure BDA0003913307440000041
生成方式为:Step 2.3, use mask vector m to generate damaged row data
Figure BDA0003913307440000041
Generated by:

Figure BDA0003913307440000042
Figure BDA0003913307440000042

其中,r表示原始行数据,掩码向量m=[m0,m1,…,mβ-1]T,mi从伯努利分布中采样获得。然后训练特征修复分类模型,训练过程为:将受损数据

Figure BDA0003913307440000043
输入自编码网络进行编码,受损数据编码z由特征修复网络恢复为修复数据
Figure BDA0003913307440000044
结合原始行数据r使用均方误差MSE计算损失,训练特征修复网络;同时受损数据编码z由分类网络分类为
Figure BDA0003913307440000045
结合聚类标签y使用交叉熵CE计算损失,训练分类网络;结合两个损失训练自编码网络,令编码结果包含原始数据与所属聚类类别的信息。训练完毕的模型同时具备特征修复编码功能和数据分类功能,最终输出为数据分类结果。Wherein, r represents the original row data, the mask vector m=[m 0 ,m 1 ,…,m β-1 ] T , and mi is obtained by sampling from the Bernoulli distribution. Then train the feature repair classification model, the training process is: the damaged data
Figure BDA0003913307440000043
Enter the self-encoding network for encoding, and the damaged data encoding z is restored to repair data by the feature repair network
Figure BDA0003913307440000044
Combined with the original row data r, the mean square error MSE is used to calculate the loss, and the feature repair network is trained; at the same time, the damaged data code z is classified by the classification network as
Figure BDA0003913307440000045
Combining the clustering label y, use the cross-entropy CE to calculate the loss and train the classification network; combine the two losses to train the autoencoder network, so that the encoding result contains the information of the original data and the cluster category it belongs to. The trained model has both the feature restoration encoding function and the data classification function, and the final output is the data classification result.

步骤2.4,将原始数据输入特征修复分类模型,使用Softmax处理特征修复分类模型输出的分类结果,获得各行数据属于每个类别的概率,选取概率最大的类别作为各行数据的分组标识;计算最大类别概率与最小类别概率的差值,选取概率差值大于预设阈值的行数据,确定水印行标识。Step 2.4, input the original data into the feature restoration classification model, use Softmax to process the classification results output by the feature restoration classification model, obtain the probability that each row of data belongs to each category, and select the category with the highest probability as the grouping identifier of each row of data; calculate the maximum category probability The difference between the minimum category probability and the row data whose probability difference is greater than the preset threshold is selected to determine the watermark row identifier.

步骤3,将水印信息冗余嵌入原始数据。Step 3, redundantly embed the watermark information into the original data.

步骤3.1,使用ASCII编码将水印信息转换为二进制形式,向转换后水印编码中添加RS纠错码,获得水印编码,水印编码长度l应满足:Step 3.1, use ASCII code to convert the watermark information into binary form, add the RS error correction code to the converted watermark code to obtain the watermark code, and the length l of the watermark code should satisfy:

k×(α-1)<l<k×α,k×(α-1)<l<k×α,

其中,k为聚类类别数,α为列标识数。Among them, k is the number of cluster categories, and α is the number of column identifiers.

步骤3.2,将水印编码以长度k分为α个子串,记为{W0,W1,…,Wα-1}。将子串Wi利用LSB嵌入第i列的第k个分组中,具体嵌入方式为:Step 3.2: Divide the watermark code into α substrings with length k, denoted as {W 0 ,W 1 ,…,W α-1 }. Use LSB to embed the substring W i into the k-th group of the i-th column. The specific embedding method is:

yj.Ai=[LSB([yj.Ai]2,χ,j.Wi)]10y j .A i =[LSB([y j .A i ] 2 ,χ,jW i )] 10 ,

其中,j.Wi表示第i个子串中的第j位;yj.Ai表示分类类别为yj且列属性为Ai的数据,即通过行列标识确定的水印嵌入位置;LSB为低有效位嵌入;χ为j.Wi在yj.Ai中的嵌入位数。同时,在未被选择为水印嵌入位置的同组行数据中以同样的方式嵌入与j.Wi相反的编码,减小数据统计特征的失真程度。Among them, jW i represents the jth bit in the i-th substring; y j .A i represents the data whose classification category is y j and the column attribute is A i , that is, the watermark embedding position determined by the row and column identification; LSB is the least significant bit Embedding; χ is the number of embedded bits of jW i in y j .A i . At the same time, in the same group of row data that is not selected as the watermark embedding position, the code opposite to jW i is embedded in the same way to reduce the degree of distortion of the statistical characteristics of the data.

步骤4,对含水印数据进行水印检测。Step 4, perform watermark detection on the watermarked data.

步骤4.1,将含水印数据输入特征修复分类模型,处理模型的输出结果获取水印行标识,具体处理方式与步骤2.4相同,但考虑到数据传输过程中的失真影响,水印检测时概率差值阈值小于水印嵌入时的阈值。根据水印所有者保留的水印列标识获取水印嵌入位置。Step 4.1, input the watermarked data into the feature repair classification model, and process the output result of the model to obtain the watermark row identification. The specific processing method is the same as step 2.4, but considering the distortion effect in the data transmission process, the probability difference threshold during watermark detection is less than Threshold for watermark embedding. Obtain the watermark embedding position according to the watermark column ID reserved by the watermark owner.

步骤4.2,使用投票表决法提取水印编码,提取方式为:Step 4.2, using the voting method to extract the watermark code, the extraction method is:

j.WBi=Vote(LSB([yj.Ai]2,χ)),j.WB i =Vote(LSB([y j .A i ] 2 ,χ)),

其中,j.WBi表示被提取水印编码第i个子串中的第j位,同时对未被选择为水印嵌入位置的同组行数据进行相同的处理,比对行标识及非行标识提取出的水印编码,若两者相同,则需重新选择步骤4.1中的概率差值阈值,直至两者不同。最后将二进制水印编码WB解码,恢复原有水印信息message=ASCII(WB)-1Among them, j.WB i represents the jth bit in the i-th substring of the extracted watermark code. At the same time, the same group of row data that is not selected as the watermark embedding position is processed in the same way, and the row identifier and the non-row identifier are compared. If the watermark codes are the same, the probability difference threshold in step 4.1 needs to be reselected until the two are different. Finally, the binary watermark code WB is decoded, and the original watermark information message=ASCII(WB) -1 is restored.

测试结果:实验基于抗列删除攻击的表格数据水印方法,对Checkup数据集进行了水印嵌入、列删除攻击和水印检测。本发明在聚类特征属性被删除50%的情况下达到0.492的行标识准确率,具备良好的抗列删除攻击能力,有效增强表格数据水印的安全性。Test results: The experiment is based on the tabular data watermarking method against column deletion attacks. Watermark embedding, column deletion attacks and watermark detection are carried out on the Checkup dataset. The invention achieves a row identification accuracy rate of 0.492 when 50% of clustering feature attributes are deleted, has good ability to resist column deletion attacks, and effectively enhances the security of table data watermarks.

以上所述的具体描述,对发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The specific description above further elaborates the purpose, technical solution and beneficial effect of the invention. It should be understood that the above description is only a specific embodiment of the present invention and is not used to limit the protection of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims (4)

1.抗列删除攻击的表格数据水印方法,其特征在于所述方法包括如下步骤:1. The form data watermarking method of anti-column delete attack is characterized in that described method comprises the steps: 步骤1,结合属性重要程度及数据失真容忍度选取重要连续变量属性列,确定水印列标识;Step 1. Select important continuous variable attribute columns in combination with attribute importance and data distortion tolerance, and determine the watermark column identification; 步骤2,构建特征修复分类网络模型确定水印行标识,首先,使用过滤式特征选择法选取聚类特征,其次,基于所选特征使用约束FCM算法进行无监督聚类,获得行数据聚类标签,然后,使用掩码向量生成受损行数据,并利用受损行数据训练特征修复分类网络模型,最后,使用模型计算各行数据分类类别概率,根据类别概率为原始行数据添加分组标识并选取行数据作为水印行标识;Step 2: Construct a feature restoration classification network model to determine the watermark row identifier. First, use the filter feature selection method to select cluster features. Secondly, use the constrained FCM algorithm to perform unsupervised clustering based on the selected features to obtain row data cluster labels. Then, use the mask vector to generate damaged row data, and use the damaged row data to train the feature repair classification network model. Finally, use the model to calculate the classification category probability of each row of data, add group identification to the original row data according to the category probability and select the row data as a watermark line identifier; 步骤3,将水印信息冗余嵌入原始数据,首先,将水印信息编码为二进制格式,并添加纠错码,最后,根据水印行标识和水印列标识确定水印嵌入位置,使用LSB算法冗余嵌入水印编码;Step 3: Redundantly embed the watermark information into the original data. First, encode the watermark information into a binary format and add an error correction code. Finally, determine the watermark embedding position according to the watermark row identifier and watermark column identifier, and use the LSB algorithm to redundantly embed the watermark coding; 步骤4,对含水印数据进行水印检测,首先,使用特征修复分类网络获得水印行标识,结合水印列标识确定水印嵌入位置,最后,提取水印编码并解码,恢复水印信息。Step 4: Perform watermark detection on the watermarked data. First, use the feature restoration classification network to obtain the watermark row identifier, combine the watermark column identifier to determine the watermark embedding position, and finally extract the watermark code and decode it to restore the watermark information. 2.根据权利要求1所述的抗列删除攻击的表格数据水印方法,其特征在于:步骤2中训练特征修复分类模型,训练过程为将受损数据
Figure FDA0003913307430000011
输入自编码网络进行编码,受损数据编码z由特征修复网络恢复为修复数据
Figure FDA0003913307430000012
结合原始行数据r使用均方误差MSE计算损失,训练特征修复网络,同时受损数据编码z由分类网络分类为
Figure FDA0003913307430000013
结合聚类标签t使用交叉熵CE计算损失,训练分类网络,结合两个损失训练自编码网络,令编码结果包含原始数据与所属聚类类别的信息,训练完毕的模型同时具备特征修复编码功能和数据分类功能,最终输出为数据分类结果。
2. The table data watermarking method for resisting row deletion attack according to claim 1, characterized in that: in step 2, the training feature repair classification model, the training process is to convert the damaged data
Figure FDA0003913307430000011
Enter the self-encoding network for encoding, and the damaged data encoding z is restored to repair data by the feature repair network
Figure FDA0003913307430000012
Combined with the original row data r, the mean square error MSE is used to calculate the loss, and the feature restoration network is trained, while the damaged data encoding z is classified by the classification network as
Figure FDA0003913307430000013
Combine the clustering label t with the cross-entropy CE to calculate the loss, train the classification network, and combine the two losses to train the autoencoder network, so that the encoding result contains the information of the original data and the cluster category. The trained model also has the function of feature repair encoding and Data classification function, the final output is the data classification result.
3.根据权利要求1所述的抗列删除攻击的表格数据水印方法,其特征在于:步骤2中将原始数据输入特征修复分类模型,使用Softmax处理特征修复分类模型输出的分类结果,获得各行数据属于每个类别的概率,选取概率最大的类别作为各行数据的分组标识;计算最大类别概率与最小类别概率的差值,选取概率差值大于预设阈值的行数据,确定水印行标识。3. The table data watermarking method against column deletion attack according to claim 1, characterized in that: in step 2, the original data is input into the feature restoration classification model, and Softmax is used to process the classification results output by the feature restoration classification model to obtain each row of data The probability of belonging to each category, select the category with the highest probability as the grouping identifier of each row of data; calculate the difference between the maximum category probability and the minimum category probability, select the row data whose probability difference is greater than the preset threshold, and determine the watermark row identifier. 4.根据权利要求1所述的抗列删除攻击的表格数据水印方法,其特征在于:步骤3中将水印编码以长度k分为α个子串,记为{W0,W1,…,Wα-1},将子串Wi利用LSB算法嵌入第i列的第k个分组中,具体嵌入方式为:4. The table data watermarking method against column deletion attack according to claim 1, characterized in that: in step 3, the watermark code is divided into α substrings with length k, denoted as {W 0 ,W 1 ,...,W α-1 }, use the LSB algorithm to embed the substring W i into the kth group of the i-th column, the specific embedding method is: yj.Ai=[LSB([yj.Ai]2,χ,j.Wi)]10y j .A i =[LSB([y j .A i ] 2 ,χ,jW i )] 10 , 其中j.Wi表示第i个水印编码子串中的第j位;yj.Ai表示分类类别为yj且列属性为Ai的数据,表示相同分类类别的行数据中嵌入的水印信息相同,LSB为低有效位嵌入;χ为j.Wi在yj.Ai中的嵌入位数,同时,在未被选择为水印嵌入位置的同组行数据中以同样的方式嵌入与j.Wi相反的编码。Where jW i represents the jth bit in the i-th watermark encoding substring; y j .A i represents the data whose classification category is y j and the column attribute is A i , indicating that the watermark information embedded in the row data of the same classification category is the same , LSB is the low-significant bit embedding ; χ is the number of embedded bits of jW i in y j . coding.
CN202211331263.1A 2022-10-28 2022-10-28 Table data watermarking method for resisting column deletion attack Pending CN115713450A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211331263.1A CN115713450A (en) 2022-10-28 2022-10-28 Table data watermarking method for resisting column deletion attack

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211331263.1A CN115713450A (en) 2022-10-28 2022-10-28 Table data watermarking method for resisting column deletion attack

Publications (1)

Publication Number Publication Date
CN115713450A true CN115713450A (en) 2023-02-24

Family

ID=85231460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211331263.1A Pending CN115713450A (en) 2022-10-28 2022-10-28 Table data watermarking method for resisting column deletion attack

Country Status (1)

Country Link
CN (1) CN115713450A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030179900A1 (en) * 2001-12-13 2003-09-25 Jun Tian Image processing methods using reversible watermarking
KR102239771B1 (en) * 2019-12-30 2021-04-13 경일대학교산학협력단 Apparatus and method for executing deep learning based watermark in various kinds of content environment
CN113222802A (en) * 2021-05-27 2021-08-06 西安电子科技大学 Digital image watermarking method based on anti-attack

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030179900A1 (en) * 2001-12-13 2003-09-25 Jun Tian Image processing methods using reversible watermarking
KR102239771B1 (en) * 2019-12-30 2021-04-13 경일대학교산학협력단 Apparatus and method for executing deep learning based watermark in various kinds of content environment
CN113222802A (en) * 2021-05-27 2021-08-06 西安电子科技大学 Digital image watermarking method based on anti-attack

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吕安强;: "抵抗多种攻击的视频水印新方案", 中国图象图形学报, no. 11, 15 November 2009 (2009-11-15) *

Similar Documents

Publication Publication Date Title
CN113518063B (en) Network intrusion detection method and system based on data enhancement and BilSTM
CN107070943B (en) Industrial internet intrusion detection method based on flow characteristic diagram and perceptual hash
CN113283590B (en) Defending method for back door attack
CN107395590A (en) A kind of intrusion detection method classified based on PCA and random forest
CN112217787B (en) Method and system for generating mock domain name training data based on ED-GAN
CN105095856A (en) Method for recognizing human face with shielding based on mask layer
CN111556016B (en) A method for identifying abnormal behavior of network traffic based on autoencoder
CN111325169B (en) Deep video fingerprint algorithm based on capsule network
CN113505826A (en) Network flow abnormity detection method based on joint feature selection
CN111597983B (en) Method for realizing identification of generated false face image based on deep convolutional neural network
CN115913764B (en) Malicious domain name training data generation method based on generation countermeasure network
CN111125750B (en) Database watermark embedding and detecting method and system based on double-layer ellipse model
CN119274024B (en) Image authenticity detection model training method and image authenticity detection method
CN115242441A (en) A network intrusion detection method based on feature selection and deep neural network
CN116109898A (en) A Generalized Zero-Shot Learning Method Based on Bidirectional Adversarial Training and Relational Metric Constraints
CN117097498A (en) DDoS attack detection method based on CNN and transducer
CN109145704B (en) A face portrait recognition method based on face attributes
CN103617431A (en) Maximum average entropy-based scale-invariant feature transform (SIFT) descriptor binaryzation and similarity matching method
CN101692288A (en) Digital watermark embedding and detecting method of CAD model indicated on basis of NURBS
CN114065150B (en) Picture copyright protection method
CN105721467B (en) Social networks Sybil crowd surveillance method
CN110414594A (en) A Classification Method of Encrypted Traffic Based on Two-Stage Judgment
CN115713450A (en) Table data watermarking method for resisting column deletion attack
CN113971282B (en) A malicious application detection method and device based on AI model
CN117992837A (en) A terminal area meteorological scene division method based on stacked denoising autoencoder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination