CN103761459B

CN103761459B - A kind of document multiple digital watermarking embedding, extracting method and device

Info

Publication number: CN103761459B
Application number: CN201410035906.7A
Authority: CN
Inventors: 陈小军; 时金桥; 徐睿; 蒲以国; 赵亮; 张锐
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2014-01-24
Filing date: 2014-01-24
Publication date: 2016-08-17
Anticipated expiration: 2034-01-24
Also published as: CN103761459A

Abstract

The invention relates to a method and device for embedding and extracting multiple digital watermarks of documents. The method for embedding multiple digital watermarks of documents comprises the following steps: obtaining original watermark information, keys and documents to be processed input by users; calculating abstracts in the original watermark information Information, generate new watermark information; store the original watermark information and new watermark information together as a database record in the database; divide the characters in the document into two layers, according to the total number of characters in the first layer of the document and the length of the new watermark information bits, Get the number of groups of new watermark information to be embedded in the first layer of the document, and embed multiple groups of new watermark information into the attribute bits in the first layer of the document in order from front to back; Watermark information is embedded in attribute bits in the second layer of the document. The invention is based on the character attribute of the Word format document, uses the key to improve the security, the repeated embedding enhances the robustness, and the multiple embedding improves the watermark capacity.

Description

Method and device for embedding and extracting multiple digital watermarks in documents

技术领域technical field

本发明涉及数字水印领域，特别涉及一种文档多重数字水印嵌入、提取方法及装置。The invention relates to the field of digital watermarks, in particular to a method and device for embedding and extracting multiple digital watermarks of documents.

背景技术Background technique

近年来，随着多媒体和网络技术的迅速发展，保护数字作品的版权成为当今学术界研究的一个热点话题。数字水印作为信息隐藏技术的重要研究方向，在文本、视频、音频等多媒体版权保护方面具有重要价值。数字水印是将序列号、文字、图像标志等版权信息嵌入到多媒体数据中，以起到版权保护、秘密通信、数据文件的真伪鉴别和产品标志等作用。In recent years, with the rapid development of multimedia and network technology, the copyright protection of digital works has become a hot topic in the current academic research. As an important research direction of information hiding technology, digital watermarking has great value in the copyright protection of text, video, audio and other multimedia. Digital watermarking is to embed serial number, text, image logo and other copyright information into multimedia data to play the role of copyright protection, secret communication, authenticity identification of data files and product logo.

现有可用性较高的文本水印方法主要有基于格式的文本水印和基于自然语言的文本水印这两大类。基于格式文本水印是迄今为止出现的最多的一类文本水印，从最初的行移位、字移位、特征编码，到后来发展出改变字体大小、颜色等方法，此类型的水印方法的研究非常活跃，但是该方法存在如安全性弱、水印容量低等不足之处。基于自然语言的文本水印最早于02年由美国Purdue大学的Mikhail.J.Atallah和VictorRaskin等人提出的。主要是通过改变句子结构、同义词替换等方法来加入水印信息。自然语言数字水印改变了文本的内容,但没有改变文本的含义和格式,添加水印后几乎不可能被察觉,也不容易被破坏。但是对于标准文件，因其格式要求比较严格，此种方法可能会改变语义，从而不适用于格式要求严格的文件。另外由于计算机对自然语言的处理还不够成熟，这成了基于自然语言文本水印技术的瓶颈。The existing text watermarking methods with high usability mainly include format-based text watermarking and natural language-based text watermarking. Format-based text watermarking is the most common type of text watermarking so far. From the initial line shifting, word shifting, and feature encoding, to the development of methods such as changing font size and color, the research on this type of watermarking method is very important. Active, but this method has shortcomings such as weak security and low watermark capacity. Text watermarking based on natural language was first proposed in 2002 by Mikhail.J.Atallah and Victor Raskin of Purdue University in the United States. The watermark information is mainly added by changing the sentence structure and replacing synonyms. Natural language digital watermarking changes the content of the text, but does not change the meaning and format of the text. After the watermark is added, it is almost impossible to be detected, and it is not easy to be destroyed. However, for standard files, this method may change the semantics due to its strict format requirements, so it is not suitable for files with strict format requirements. In addition, because the processing of natural language by computer is not mature enough, it has become the bottleneck of natural language text watermarking technology.

发明内容Contents of the invention

本发明所要解决的技术问题是提供一种基于Word格式文档的字符属性、利用密钥提高安全性、重复嵌入加强鲁棒性、多重嵌入提高了水印容量的文档多重数字水印嵌入、提取方法及装置。The technical problem to be solved by the present invention is to provide a document multiple digital watermark embedding and extraction method and device based on character attributes of Word format documents, using keys to improve security, repeated embedding to enhance robustness, and multiple embedding to improve watermark capacity .

本发明解决上述技术问题的技术方案如下：一种文档多重数字水印嵌入方法，包括以下步骤：The technical solution of the present invention to solve the above-mentioned technical problems is as follows: a method for embedding multiple digital watermarks in a document, comprising the following steps:

步骤1：获取用户输入的原始水印信息、密钥及待处理的文档；Step 1: Obtain the original watermark information, key and documents to be processed input by the user;

步骤2：利用摘要算法计算原始水印信息中的摘要信息，生成新水印信息，根据新水印信息得到新水印信息位的长度；Step 2: use the digest algorithm to calculate the digest information in the original watermark information, generate new watermark information, and obtain the length of the new watermark information bits according to the new watermark information;

步骤3：将原始水印信息和新水印信息共同作为一条数据库记录存入数据库，用于提取水印时查询原始水印信息；Step 3: Store the original watermark information and the new watermark information together as a database record into the database, which is used to query the original watermark information when extracting the watermark;

步骤4：将文档中的字符划分为两层，根据文档第一层的字符总数、新水印信息位的长度，得到待嵌入文档第一层的新水印信息的组数，按照从前往后的顺序将多组新水印信息分别嵌入到文档第一层中的属性位中，多组新水印信息之间利用分隔符分隔；Step 4: Divide the characters in the document into two layers, and get the number of groups of new watermark information to be embedded in the first layer of the document according to the total number of characters in the first layer of the document and the length of the new watermark information bit, in the order from front to back Embedding multiple sets of new watermark information into attribute bits in the first layer of the document respectively, and separating multiple sets of new watermark information with separators;

步骤5：按照从后往前的顺序将多组新水印信息分别嵌入到文档第二层中的属性位中，多组新水印信息之间利用分隔符分隔，嵌入文档第二层中新水印信息的组数为嵌入文档第一层中新水印信息的组数的二倍。Step 5: Embed multiple sets of new watermark information into the attribute bits in the second layer of the document in order from back to front, and separate multiple sets of new watermark information with separators, and embed the new watermark information in the second layer of the document The number of groups is twice the number of groups of the new watermark information embedded in the first layer of the document.

本发明的有益效果是：本发明基于Word格式文档的字符属性，使用密钥提高了安全性，重复嵌入加强了鲁棒性，多重嵌入提高了水印容量。The beneficial effects of the present invention are: the present invention is based on the character attribute of the Word format document, uses the key to improve the security, the repeated embedding enhances the robustness, and the multiple embedding increases the watermark capacity.

在上述技术方案的基础上，本发明还可以做如下改进。On the basis of the above technical solutions, the present invention can also be improved as follows.

进一步，将文档中的字符划分为两层的方法具体包括以下步骤：Further, the method for dividing the characters in the document into two layers specifically includes the following steps:

获取用作密钥的字符的Unicode编码，将用作密钥的字符的Unicode编码转化为二进制序列，将二进制序列的最后两位作为密钥序列；Obtain the Unicode encoding of the character used as the key, convert the Unicode encoding of the character used as the key into a binary sequence, and use the last two digits of the binary sequence as the key sequence;

获取文档中所有字符的Unicode编码，将文档中每个字符的Unicode编码分别转化为二进制序列；Obtain the Unicode encoding of all characters in the document, and convert the Unicode encoding of each character in the document into a binary sequence;

将密钥序列分别与文档中每个字符转化成的二进制序列进行异或操作，若结果为00、10，则将该字符划分为文档第一层；若结果为01、11，则划分为文档第二层。The key sequence is XORed with the binary sequence converted from each character in the document. If the result is 00, 10, the character is divided into the first layer of the document; if the result is 01, 11, it is divided into the document Second floor.

进一步，所述分隔符为Unicode编码中任意不常用的非可见字符的二进制序列。Further, the delimiter is a binary sequence of any uncommonly used non-visible characters in Unicode encoding.

进一步，将多组新水印信息分别嵌入到文档中的不同属性位具体包括以下步骤：Further, embedding multiple sets of new watermark information into different attribute bits in the document specifically includes the following steps:

对于第一层，分别修改第一层中所有字符的NoProofing属性值，若当前待嵌入的新水印信息为1，则将NoProofing属性值修改为True，否则，保持原始值False不变；For the first layer, respectively modify the NoProofing attribute value of all characters in the first layer, if the current new watermark information to be embedded is 1, then modify the NoProofing attribute value to True, otherwise, keep the original value False unchanged;

对于第二层，分别修改第二层中所有字符的LanguageIDOther属性值，若当前待嵌入的新水印信息为00，则保持原始值不变，若当前待嵌入的新水印信息位为01，则修改LanguageIDOther属性值为wdBasque，若当前待嵌入的新水印信息位为10，则修改LanguageIDOther属性值为wdVenda，若当前待嵌入的新水印信息位为11，则修改LanguageIDOther属性值为wdEstonian。For the second layer, modify the LanguageIDOther attribute values of all characters in the second layer respectively. If the current new watermark information to be embedded is 00, keep the original value unchanged; if the current new watermark information bit to be embedded is 01, then modify The LanguageIDOther attribute value is wdBasque. If the current new watermark information bit to be embedded is 10, modify the LanguageIDOther attribute value to wdVenda. If the current new watermark information bit to be embedded is 11, modify the LanguageIDOther attribute value to wdEstonian.

进一步，一种文档多重数字水印提取方法，包括以下步骤：Further, a method for extracting multiple digital watermarks from a document comprises the following steps:

步骤1a:检测待处理的文档中是否嵌入水印信息，如果是，所有字符按规则分为两层，转入步骤2a，否则，结束处理；Step 1a: detect whether to embed watermark information in the document to be processed, if so, all characters are divided into two layers according to the rules, go to step 2a, otherwise, end processing;

步骤2a:在文档第一层的属性位中提取水印信息，在文档第二层的属性位中提取水印信息，分别根据分隔符来得到每层提取的水印信息的实际提取组数；Step 2a: extract the watermark information in the attribute bit of the first layer of the document, extract the watermark information in the attribute bit of the second layer of the document, and obtain the actual extraction group number of the watermark information extracted by each layer according to the delimiter respectively;

步骤3a:根据文档第一层的字符总数、在文档的第一层和第二层中提取出的水印信息位的长度，分别得到嵌入文档第一层及第二层的水印信息的预定提取组数；Step 3a: According to the total number of characters in the first layer of the document, the length of the watermark information bits extracted in the first layer and the second layer of the document, respectively obtain the predetermined extraction group of the watermark information embedded in the first layer and the second layer of the document number;

步骤4a:当提取出的多组水印信息一致且均匹配到一条数据库记录、每层的实际提取组数与预定提取组数均相等时，则所有水印信息正常，文档没有遭到攻击，查询数据库后输出原始水印信息；否则，进行水印纠错。Step 4a: When the extracted multiple sets of watermark information are consistent and all match a database record, and the actual number of extracted groups of each layer is equal to the number of scheduled extraction groups, then all watermark information is normal, the document has not been attacked, and the database is queried Then output the original watermark information; otherwise, perform watermark error correction.

进一步，所述步骤4a中当提取出的多组水印信息一致且均匹配到一条数据库记录、每层的实际提取组数与预定提取组数均相等时，还包括在文档第二层中的属性位中提取出的水印信息的组数为在文档第一层中的属性位中提取的水印信息的组数的二倍时，所有水印信息正常。Further, in the step 4a, when the extracted multiple sets of watermark information are consistent and all match a database record, and the actual number of extracted groups of each layer is equal to the predetermined number of extracted groups, the attribute in the second layer of the document is also included When the group number of the watermark information extracted from the bit is twice the group number of the watermark information extracted from the attribute bit in the first layer of the document, all the watermark information is normal.

进一步，文档中每个字符的NoProofing属性值和LanguageIDOther属性值被系统预定为默认值，逐个检测待提取水印信息的文档中每个字符的字符属性，若存在NoProofing属性值和LanguageIDOther属性值与默认值不同的字符，则该文档为嵌入水印信息的文档，否则，该文档为未嵌入水印信息的文档。Further, the NoProofing attribute value and the LanguageIDOther attribute value of each character in the document are predetermined by the system as default values, and the character attributes of each character in the document to be extracted for watermark information are detected one by one. different characters, the document is a document with embedded watermark information; otherwise, the document is a document without embedded watermark information.

进一步，所述水印纠错具体包括以下步骤：Further, the watermark error correction specifically includes the following steps:

步骤3a.1：按分隔符提取的多组水印信息，若多组水印信息不完全一致，且至少一组水印信息匹配到一条数据库记录时，返回提取出的水印信息并提示文档受损情况；否则，转3a.2；Step 3a.1: If multiple sets of watermark information are extracted according to the delimiter, if the multiple sets of watermark information are not completely consistent, and at least one set of watermark information matches a database record, return the extracted watermark information and prompt the damage of the document; Otherwise, go to 3a.2;

步骤3a.2：若多组水印信息均不与任一数据库记录匹配，提示文档受损严重，提取出水印信息失败。Step 3a.2: If multiple sets of watermark information do not match any database record, it prompts that the document is seriously damaged, and the extraction of watermark information fails.

进一步，一种文档多重数字水印嵌入装置，包括获取模块，生成模块，存储模块，第一嵌入模块和第二嵌入模块；Further, a document multiple digital watermark embedding device, including an acquisition module, a generation module, a storage module, a first embedding module and a second embedding module;

所述获取模块，用于获取用户输入的原始水印信息、密钥及待处理的文档；The obtaining module is used to obtain the original watermark information input by the user, the key and the document to be processed;

所述生成模块，用于利用摘要算法计算原始水印信息中的摘要信息，生成新水印信息，根据新水印信息得到新水印信息位的长度；The generation module is used to calculate the summary information in the original watermark information by using a summary algorithm, generate new watermark information, and obtain the length of the new watermark information bit according to the new watermark information;

所述存储模块，用于将原始水印信息和新水印信息共同作为一条数据库记录存入数据库，用于提取水印时查询原始水印信息；The storage module is used to store the original watermark information and the new watermark information together as a database record into the database, and is used to query the original watermark information when extracting the watermark;

所述第一嵌入模块，用于将文档中的字符划分为两层，根据文档第一层的字符总数、新水印信息位的长度，得到待嵌入文档第一层的新水印信息的组数，按照从前往后的顺序将多组新水印信息分别嵌入到文档第一层中的属性位中，多组新水印信息之间利用分隔符分隔；The first embedding module is used to divide the characters in the document into two layers, and obtain the group number of new watermark information to be embedded in the first layer of the document according to the total number of characters in the first layer of the document and the length of the new watermark information bit, Embed multiple sets of new watermark information into the attribute bits in the first layer of the document in order from front to back, and separate multiple sets of new watermark information with separators;

所述第二嵌入模块，用于按照从后往前的顺序将多组新水印信息分别嵌入到文档第二层中的属性位中，多组新水印信息之间利用分隔符分隔，嵌入文档第二层中新水印信息的组数为嵌入文档第一层中新水印信息的组数的二倍。The second embedding module is used to respectively embed multiple sets of new watermark information into the attribute bits in the second layer of the document in order from back to front, the multiple sets of new watermark information are separated by separators, and the embedded document first The group number of the new watermark information in the second layer is twice the group number of the new watermark information in the first layer embedded in the document.

进一步，一种文档多重数字水印提取装置，包括检测模块，提取模块，计算模块和匹配模块；Further, a document multiple digital watermark extraction device, including a detection module, an extraction module, a calculation module and a matching module;

所述检测模块，用于检测待处理的文档中是否嵌入水印信息，如果是，所有字符按规则分为两层，转入提取模块，否则，结束处理；The detection module is used to detect whether watermark information is embedded in the document to be processed, if so, all characters are divided into two layers according to the rules, and transferred to the extraction module, otherwise, the processing is ended;

所述提取模块，用于在文档第一层的属性位中提取水印信息，在文档第二层的属性位中提取水印信息，分别根据分隔符来得到每层提取的水印信息的实际提取组数；The extraction module is used to extract watermark information from the attribute bits of the first layer of the document, extract the watermark information from the attribute bits of the second layer of the document, and obtain the actual extraction group numbers of the watermark information extracted by each layer according to the delimiter respectively ;

所述计算模块，用于根据文档第一层的字符总数、在文档的第一层和第二层中提取出的水印信息位的长度，分别得到嵌入文档第一层及第二层的水印信息的预定提取组数；The calculation module is used to obtain the watermark information embedded in the first layer and the second layer of the document respectively according to the total number of characters in the first layer of the document and the length of the watermark information bits extracted from the first layer and the second layer of the document The number of scheduled extraction groups;

所述匹配模块，用于当提取出的多组水印信息一致且均匹配到一条数据库记录、每层的实际提取组数与预定提取组数均相等时，则所有水印信息正常，文档没有遭到攻击，查询数据库后输出原始水印信息；否则，进行水印纠错。The matching module is used for when the extracted multiple sets of watermark information are consistent and all match to one database record, and the actual number of extracted groups of each layer is equal to the predetermined number of extracted groups, then all the watermark information is normal, and the document has not been damaged. Attack, query the database and output the original watermark information; otherwise, perform watermark error correction.

附图说明Description of drawings

图1为本发明文档多重数字水印嵌入方法流程图；Fig. 1 is the flow chart of the multiple digital watermark embedding method of the document of the present invention;

图2为本发明文档多重数字水印提取方法流程图；Fig. 2 is a flow chart of the method for extracting multiple digital watermarks from documents of the present invention;

图3为本发明文档多重数字水印嵌入装置结构图；Fig. 3 is a structure diagram of a device for embedding multiple digital watermarks in documents according to the present invention;

图4为本发明文档多重数字水印提取装置结构图。Fig. 4 is a structural diagram of a device for extracting multiple digital watermarks from documents according to the present invention.

附图中，各标号所代表的部件列表如下：In the accompanying drawings, the list of parts represented by each label is as follows:

1、获取模块，2、生成模块，3、存储模块，4、第一嵌入模块，5、第二嵌入模块，6、检测模块，7、提取模块，8、匹配模块。1. Acquisition module, 2. Generation module, 3. Storage module, 4. First embedding module, 5. Second embedding module, 6. Detection module, 7. Extraction module, 8. Matching module.

具体实施方式detailed description

以下结合附图对本发明的原理和特征进行描述，所举实例只用于解释本发明，并非用于限定本发明的范围。The principles and features of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

如图1所示，为本发明文档多重数字水印嵌入方法流程图；图2为本发明文档多重数字水印提取方法流程图；图3为本发明文档多重数字水印嵌入装置结构图；图4为本发明文档多重数字水印提取装置结构图。As shown in Figure 1, it is a flowchart of the document multiple digital watermark embedding method of the present invention; Figure 2 is a flowchart of the document multiple digital watermark extraction method of the present invention; Figure 3 is a structural diagram of the document multiple digital watermark embedding device of the present invention; Figure 4 is the present invention Structural diagram of multiple digital watermark extraction device for invention documents.

实施例1Example 1

一种文档多重数字水印嵌入方法，包括以下步骤：A method for embedding multiple digital watermarks in a document, comprising the following steps:

将文档中的字符划分为两层的方法具体包括以下步骤：The method for dividing the characters in the document into two layers specifically includes the following steps:

所述分隔符为Unicode编码中任意不常用的非可见字符的二进制序列。The delimiter is a binary sequence of any uncommonly used non-visible characters in Unicode encoding.

将多组新水印信息分别嵌入到文档中的不同属性位具体包括以下步骤：Embedding multiple sets of new watermark information into different attribute bits in the document specifically includes the following steps:

一种文档多重数字水印提取方法，包括以下步骤：A method for extracting multiple digital watermarks from documents, comprising the following steps:

所述步骤4a中当提取出的多组水印信息一致且均匹配到一条数据库记录、每层的实际提取组数与预定提取组数均相等时，还包括在文档第二层中的属性位中提取出的水印信息的组数为在文档第一层中的属性位中提取的水印信息的组数的二倍时，所有水印信息正常。In the step 4a, when the multiple sets of watermark information extracted are consistent and all match to a database record, and the actual number of extracted groups of each layer is equal to the predetermined number of extracted groups, it is also included in the attribute bits in the second layer of the document When the group number of the extracted watermark information is twice the group number of the watermark information extracted from the attribute bits in the first layer of the document, all the watermark information is normal.

所述步骤1a中水印信息的检测方法为：The detection method of the watermark information in the step 1a is:

文档中每个字符的NoProofing属性值和LanguageIDOther属性值被系统预定为默认值，逐个检测待提取水印信息的文档中每个字符的字符属性，若存在NoProofing属性值和LanguageIDOther属性值与默认值不同的字符，则该文档为嵌入水印信息的文档，否则，该文档为未嵌入水印信息的文档。The NoProofing attribute value and LanguageIDOther attribute value of each character in the document are predetermined by the system as the default value, and the character attribute of each character in the document to be extracted from the watermark information is detected one by one. If there is a NoProofing attribute value and LanguageIDOther attribute value different from the default value characters, the document is a document with embedded watermark information, otherwise, the document is a document without embedded watermark information.

所述水印纠错具体包括以下步骤：The watermark error correction specifically includes the following steps:

一种文档多重数字水印嵌入装置，包括获取模块1，生成模块2，存储模块3，第一嵌入模块4和第二嵌入模块5；A document multiple digital watermark embedding device, including an acquisition module 1, a generation module 2, a storage module 3, a first embedding module 4 and a second embedding module 5;

所述获取模块1，用于获取用户输入的原始水印信息、密钥及待处理的文档；The obtaining module 1 is used to obtain the original watermark information input by the user, the key and the document to be processed;

所述生成模块2，用于利用摘要算法计算原始水印信息中的摘要信息，生成新水印信息，根据新水印信息得到新水印信息位的长度；The generation module 2 is used to calculate the summary information in the original watermark information by using the summary algorithm, generate new watermark information, and obtain the length of the new watermark information bit according to the new watermark information;

所述存储模块3，用于将原始水印信息和新水印信息共同作为一条数据库记录存入数据库，用于提取水印时查询原始水印信息；The storage module 3 is used to store the original watermark information and the new watermark information together as a database record into the database, and to query the original watermark information when extracting the watermark;

所述第一嵌入模块4，用于将文档中的字符划分为两层，根据文档第一层的字符总数、新水印信息位的长度，得到待嵌入文档第一层的新水印信息的组数，按照从前往后的顺序将多组新水印信息分别嵌入到文档第一层中的属性位中，多组新水印信息之间利用分隔符分隔；The first embedding module 4 is used to divide the characters in the document into two layers, and obtain the group number of new watermark information to be embedded in the first layer of the document according to the total number of characters in the first layer of the document and the length of the new watermark information bit , according to the order from front to back, multiple sets of new watermark information are respectively embedded into the attribute bits in the first layer of the document, and multiple sets of new watermark information are separated by separators;

所述第二嵌入模块5，用于按照从后往前的顺序将多组新水印信息分别嵌入到文档第二层中的属性位中，多组新水印信息之间利用分隔符分隔，嵌入文档第二层中新水印信息的组数为嵌入文档第一层中新水印信息的组数的二倍。The second embedding module 5 is used to embed multiple sets of new watermark information into the attribute bits in the second layer of the document in order from back to front, and multiple sets of new watermark information are separated by separators to embed the document The number of sets of new watermark information in the second layer is twice the number of sets of new watermark information in the first layer of the embedded document.

一种文档多重数字水印提取装置，包括检测模块6，提取模块7，计算模块8和匹配模块9；A document multiple digital watermark extraction device, including a detection module 6, an extraction module 7, a calculation module 8 and a matching module 9;

所述检测模块6，用于检测待处理的文档中是否嵌入水印信息，如果是，所有字符按规则分为两层，转入提取模块7，否则，结束处理；The detection module 6 is used to detect whether the watermark information is embedded in the document to be processed, if so, all characters are divided into two layers according to the rules, and are transferred to the extraction module 7, otherwise, the processing is terminated;

所述提取模块7，用于在文档第一层的属性位中提取水印信息，在文档第二层的属性位中提取水印信息，分别根据分隔符来得到每层提取的水印信息的实际提取组数；The extraction module 7 is used to extract the watermark information from the attribute bits of the first layer of the document, extract the watermark information from the attribute bits of the second layer of the document, and obtain the actual extraction group of the watermark information extracted by each layer according to the delimiter respectively number;

所述计算模块8，用于根据文档第一层的字符总数、在文档的第一层和第二层中提取出的水印信息位的长度，分别得到嵌入文档第一层及第二层的水印信息的预定提取组数；The calculation module 8 is used to obtain the watermarks embedded in the first layer and the second layer of the document respectively according to the total number of characters in the first layer of the document and the length of the watermark information bits extracted from the first layer and the second layer of the document the number of intended extraction groups of information;

所述匹配模块9，用于当提取出的多组水印信息一致且均匹配到一条数据库记录、每层的实际提取组数与预定提取组数均相等时，则所有水印信息正常，文档没有遭到攻击，查询数据库后输出原始水印信息；否则，进行水印纠错。The matching module 9 is used for when the extracted multiple sets of watermark information are consistent and all match to one database record, and the actual extracted sets of each layer are equal to the predetermined extracted sets, then all the watermark information is normal and the document has not been damaged. If there is an attack, query the database and output the original watermark information; otherwise, perform watermark error correction.

在具体实施中，本发明的嵌入方法包括以下6个步骤：In specific implementation, the embedding method of the present invention comprises the following 6 steps:

1)输入原始水印信息、密钥和待处理的Word文档；1) Input the original watermark information, key and Word document to be processed;

2)用MD5或SHA1等信息摘要算法计算原始水印的摘要信息，将此作为之后使用的水印数据；2) Calculate the abstract information of the original watermark with an information abstract algorithm such as MD5 or SHA1, and use this as the watermark data used later;

3)将生成的水印信息与原始水印信息作为一条记录存入数据库，用于提取时查询原始信息；3) Store the generated watermark information and the original watermark information as a record in the database for querying the original information when extracting;

4)将Word文档的所有字符划分为两层，对于不同层，将水印信息嵌入到不同的属性位；4) All characters of the Word document are divided into two layers, and for different layers, the watermark information is embedded in different attribute bits;

5)若总字符数为N，摘要信息位长度为M，则嵌入K=N/M组水印，组数值向下取整。每组水印之间需要分隔符，如可以选取Unicode编码的非可见字符的RLO作为每组之间的分隔符，其值为0010000000101110。对于第一层字符，按照从前往后的顺序嵌入K组水印；5) If the total number of characters is N and the bit length of the summary information is M, then K=N/M groups of watermarks are embedded, and the values of the groups are rounded down. A separator is required between each group of watermarks. For example, the RLO of invisible characters encoded in Unicode can be selected as the separator between each group, and its value is 0010000000101110. For the first layer of characters, K sets of watermarks are embedded in the order from front to back;

6)对于第二层字符，同步骤5，按照从后往前的顺序嵌入2*K组水印。6) For the second layer of characters, the same as step 5, embed 2*K groups of watermarks in order from back to front.

上述步骤4）、5）、6）是本方法的核心。The above steps 4), 5), and 6) are the core of this method.

步骤4），文本分层的方法为：获取用作密钥字符的Unicode编码，将其转换成二进制序列，取最后两位作为密钥。同时在嵌入过程中，逐个获得文本字符Unicode编码，也将其转换成二进制序列，取最后两位，与密钥进行异或操作，若Step 4), the method of text layering is: obtain the Unicode code used as the key character, convert it into a binary sequence, and take the last two digits as the key. At the same time, during the embedding process, the Unicode encoding of the text characters is obtained one by one, and it is also converted into a binary sequence, and the last two digits are taken, and XOR operation is performed with the key, if

●结果为00、10，则划分为第一层，修改NoProofing位；●If the result is 00 or 10, it is divided into the first layer, and the NoProofing bit is modified;

●结果为01、11，则划分为第二层，修改LanguageIDOther位。●If the result is 01 or 11, it is divided into the second layer, and the LanguageIDOther bit is modified.

步骤5）、6），本方法采用微软官方的OLE接口技术，实现对字符属性的操作。嵌入水印的基本原理是利用Word文档中单个文字的属性：NoProofing和LanguageIDOther。这两个属性的作用如下：对于Selection对象（如单个字符等），如果NoProofing属性该值为True，则拼写和语法检查工具将忽略指定的文字；字符的LanguageIDOther属性，此属性位可以设置为使用人数较少的语言的枚举值，微软推荐使用本属性来设置或返回在Microsoft Word从右向左语言版本所创建的文档中西文文字所用的语言。LanguageIDOther属性共有64个枚举值，经过研究筛选，本方法选取其中三个使用人数较少语言的枚举值（wdBasque、wdVenda、wdEstonian）作为修改值，即每个字符可嵌入两个水印位，第二层可以嵌入第一层的两倍信息，从而提高了水印容量。以上两个字符属性具有通过编程才能发现、添加和修改的特征，在普通的Word程序的操作中也不能清除此水印特征，具备较强的隐蔽性和抗攻击性。重复嵌入多次是为了提高其鲁棒性，即使遭到删除修改等攻击，只要有一组水印是正常的，那么就可以恢复出水印的原始信息。Steps 5), 6), this method uses Microsoft's official OLE interface technology to realize the operation of character attributes. The basic principle of embedding a watermark is to use the properties of a single word in a Word document: NoProofing and LanguageIDOther. The functions of these two properties are as follows: For the Selection object (such as a single character, etc.), if the value of the NoProofing property is True, the spelling and grammar checker will ignore the specified text; the LanguageIDOther property of the character, this property bit can be set to use An enumeration value for a language with a small number of languages. Microsoft recommends using this property to set or return the language used for Western text in documents created in right-to-left language versions of Microsoft Word. The LanguageIDOther attribute has a total of 64 enumeration values. After research and screening, this method selects three of the enumeration values (wdBasque, wdVenda, wdEstonian) of languages with fewer users as modified values, that is, each character can embed two watermark bits, The second layer can embed twice the information of the first layer, thus increasing the watermark capacity. The above two character attributes have the characteristics that can only be discovered, added and modified by programming, and this watermark characteristic cannot be cleared in the operation of ordinary Word programs, and has strong concealment and anti-attack. Repeated embedding multiple times is to improve its robustness. Even if it is attacked by deletion and modification, as long as a set of watermarks is normal, the original information of the watermark can be recovered.

水印的提取方法为嵌入方法的逆过程，为：The watermark extraction method is the inverse process of the embedding method, which is:

1)将待检测Word文档的所有字符按规则分为两层；1) All characters of the Word document to be detected are divided into two layers according to the rules;

2)对每层的字符逐个按嵌入规则读取数据，得到n组水印信息；2) read the data one by one according to the embedding rules for the characters of each layer, and obtain n groups of watermark information;

3)当n组水印一致，并可以匹配到一条数据库记录时，则可以表明所有水印信息完全正常，文档没有遭到攻击，查询数据库后输出原始水印信息。否则，转水印纠错算法。3) When the n groups of watermarks are consistent and can be matched to a database record, it can be shown that all watermark information is completely normal and the document has not been attacked, and the original watermark information is output after querying the database. Otherwise, switch to the watermark error correction algorithm.

水印的纠错方法为：The error correction method of watermark is:

1)按分隔符提取的n组水印，若n组不完全一致，但是至少1组水印匹配到一条数据库记录时，如文档遭到增加、删除字符等攻击破坏，返回提取出的水印信息并提示文档受损情况。否则，转2；1) For n groups of watermarks extracted by delimiters, if the n groups are not completely consistent, but at least one group of watermarks matches a database record, if the document is damaged by attacks such as adding or deleting characters, return the extracted watermark information and prompt Document damage. Otherwise, go to 2;

2)n组水印都不匹配数据库记录，表示每组水印信息遭到不同程度破坏，提示文档受损严重，不能提取出水印信息。2) None of the n groups of watermarks match the database records, which means that the watermark information of each group has been damaged to varying degrees, indicating that the document is seriously damaged and the watermark information cannot be extracted.

水印的检测方法为：The watermark detection method is:

Word中每个字符的NoProoing和LanguageIDOther的默认值分别为FALSE和1033（wdEnglishUS），逐个检测输入的字符属性，若存在这两个属性不为默认值的字符，那么该文档为嵌入水印的文档。The default values of NoProoing and LanguageIDOther for each character in Word are FALSE and 1033 (wdEnglishUS) respectively, and the input character attributes are detected one by one. If there are characters whose these two attributes are not the default values, then the document is a document with an embedded watermark.

积极效果positive effect

嵌入水印信息的字符属性为不可见属性，所以嵌入水印之后从视觉上是不可感知的，具有良好的隐蔽性。The character attribute embedded in the watermark information is an invisible attribute, so after embedding the watermark, it is visually imperceptible and has good concealment.

从统计学理论上来说，每层字符数平均占50%，100个字符划分为两层，每层平均分别有50个字符，那么水印容量就是150%。经实验论证，结果如表1所示，实际的水印容量接近150%。相对于其他文本水印算法有较大的提高，如表2所示。From a statistical point of view, the average number of characters in each layer accounts for 50%, 100 characters are divided into two layers, and each layer has an average of 50 characters, then the watermark capacity is 150%. After experimental demonstration, the results are shown in Table 1, and the actual watermark capacity is close to 150%. Compared with other text watermarking algorithms, it has a greater improvement, as shown in Table 2.

嵌入水印时，原始水印信息用信息摘要算法进行加密，即使获得嵌入的水印信息，也不能得到原始水印信息，提高了水印的安全性。另外，使用密钥进行了分层，如果提取时输入错误的密钥，那么将会错误分层，导致提取的属性错位，那么就会得不到水印信息，从而进一步提高了水印的安全性。When the watermark is embedded, the original watermark information is encrypted with the information digest algorithm, even if the embedded watermark information is obtained, the original watermark information cannot be obtained, which improves the security of the watermark. In addition, the key is used for stratification. If the wrong key is entered during extraction, the stratification will be wrong, resulting in the misplacement of the extracted attributes, and the watermark information will not be obtained, thereby further improving the security of the watermark.

如果嵌入水印后的文档遭到增加、删除字符等攻击，提取出水印信息后，根据分隔符判断，如图2所示，下划线为分隔符，矩形框为水印信息。若在分隔符后有完整的水印信息，则将其提取即可，可在一定程度上保证水印方法的鲁棒性。If the document embedded with the watermark is attacked by adding or deleting characters, etc., after extracting the watermark information, judge according to the separator, as shown in Figure 2, the underscore is the separator, and the rectangular box is the watermark information. If there is complete watermark information after the delimiter, it can be extracted, which can guarantee the robustness of the watermark method to a certain extent.

表1水印容量统计Table 1 Watermark Capacity Statistics

表2文本水印算法容量对比Table 2 Capacity comparison of text watermarking algorithms

以上所述仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection of the present invention. within range.

Claims

1. a document multiple digital watermarking embedding grammar, it is characterised in that comprise the following steps:

Step 1: obtain the original watermark information of user's input and pending document；

Step 2: utilize digest algorithm to calculate the summary info in original watermark information, generate new watermark information, according to new watermark Information obtains the length of new watermark information position；

Step 3: original watermark information and new watermark information are stored in data base collectively as a data storehouse record, are used for extracting Original watermark information is inquired about during watermark；

Step 4: the character in document is divided into two-layer, the length of watermark information position total according to the character of document ground floor, new Degree, obtains the group number of the new watermark information of document ground floor to be embedded, will organize new watermark information according to order from front to back more Being respectively embedded in the attribute position in document ground floor, organize utilizes separators between new watermark information more；

Step 5: be respectively embedded in organizing new watermark information in the attribute position in the document second layer according to order from back to front more, Organize utilizes separators more between new watermark information, embed in the document second layer group number of new watermark information for embedding document the The group two times of number of new watermark information in one layer.

Document multiple digital watermarking embedding grammar the most according to claim 1, it is characterised in that: the character in document is drawn The method being divided into two-layer specifically includes following steps:

Obtaining the Unicode coding of the character being used as key, the Unicode of the character that will act as key is encoded translated for binary system Sequence, using binary sequence last two as key sequence；

Obtain the Unicode coding of all characters in document, the Unicode of character each in document is encoded and is separately converted to two System sequence；

The binary sequence that each character changes into document respectively by key sequence carries out xor operation, if result be 00, 10, then this character is divided into document ground floor；If result is 01,11, then it is divided into the document second layer.

Document multiple digital watermarking embedding grammar the most according to claim 1, it is characterised in that: described separator is The binary sequence of the non-visible character being arbitrarily of little use in Unicode coding.

Document multiple digital watermarking embedding grammar the most according to claim 1, it is characterised in that: new watermark information will be organized more The different attribute position being respectively embedded in document specifically includes following steps:

For ground floor, it is respectively modified the NoProofing property value of all characters in ground floor, if new water the most to be embedded Official seal breath is 1, then NoProofing property value is revised as True, otherwise, keeps original value False constant；

For the second layer, it is respectively modified the LanguageIDOther property value of all characters in the second layer, if the most to be embedded New watermark information is 00, then keep original value constant, if new watermark information position the most to be embedded is 01, then revise LanguageIDOther property value is wdBasque, if new watermark information position the most to be embedded is 10, then revises LanguageIDOther property value is wdVenda, if new watermark information position the most to be embedded is 11, then revises LanguageIDOther property value is wdEstonian.

5. a document multiple digital watermarking extracting method, it is characterised in that comprise the following steps:

Step 1a: detect in pending document whether embed watermark information, if it is, all characters are divided into two-layer by rule, Proceeding to step 2a, otherwise, end processes；

Step 2a: extract watermark information in the attribute position of document ground floor, extracts watermark in the attribute position of the document second layer Information, obtains the actual extracting group number of every layer of watermark information extracted respectively according to separator；

Step 3a: according to the character of document ground floor sum, the watermark information that extracts in the ground floor and the second layer of document The length of position, respectively obtains the predetermined extraction group number of the watermark information embedding document ground floor and the second layer；

Step 4a: when the many groups watermark information extracted unanimously and all matches a data storehouse record, the actual extracting of every layer When group number is the most equal with predetermined extraction group number, the most all watermark informations are normal, and document is not attacked, defeated after inquiry data base Go out original watermark information；Otherwise, error correct is carried out.

Document multiple digital watermarking extracting method the most according to claim 5, it is characterised in that: when carrying in described step 4a The many groups watermark information taken out unanimously and all matches a data storehouse record, the actual extracting group number of every layer and predetermined extraction group When number is the most equal, it is additionally included in the attribute position in the document second layer group number of the watermark information extracted at document ground floor In attribute position in extract the group two times of number of watermark information time, all watermark informations are normal.

Document multiple digital watermarking extracting method the most according to claim 5, it is characterised in that watermark in described step 1a The detection method of information is:

In document, NoProofing property value and the LanguageIDOther property value of each character are predefined for acquiescence by system Value, detect the character attibute of each character in the document of watermark information to be extracted one by one, if exist NoProofing property value and The character that LanguageIDOther property value is different from default value, then the document is the document embedding watermark information, otherwise, should Document is the document being not embedded into watermark information.

Document multiple digital watermarking extracting method the most according to claim 5, it is characterised in that: described error correct is concrete Comprise the following steps:

Step 3a.1: the many groups watermark information extracted by separator, if how group watermark information is not quite identical, and least one set water When official seal breath matches a data storehouse record, return the watermark information extracted and point out document damage situations；Otherwise, turn 3a.2；

Step 3a.2: if how group watermark information the most not with any database record matching, prompting document is impaired seriously, extracts water outlet Official seal ceases unsuccessfully.

9. a document multiple digital watermarking flush mounting, it is characterised in that: include acquisition module (1), generation module (2), deposit Storage module (3), first embeds module (4) and second embeds module (5)；

Described acquisition module (1), for obtaining the original watermark information of user's input and pending document；

Described generation module (2), for utilizing digest algorithm to calculate the summary info in original watermark information, generates new watermark letter Breath, obtains the length of new watermark information position according to new watermark information；

Described memory module (3), for being stored in original watermark information and new watermark information collectively as a data storehouse record Data base, inquiry original watermark information when being used for extracting watermark；

Described first embeds module (4), and for the character in document is divided into two-layer, the character according to document ground floor is total The length of watermark information position several, new, obtains the group number of the new watermark information of document ground floor to be embedded, suitable according to from front to back Sequence is respectively embedded in organizing new watermark information in the attribute position in document ground floor more, and organize utilizes separation more between new watermark information Symbol separates；

Described second embeds module (5), for being respectively embedded in document according to order from back to front by organizing new watermark information more In attribute position in the second layer, organize utilizes separators more between new watermark information, embeds new watermark letter in the document second layer The group number of breath is two times of the group number embedding new watermark information in document ground floor.

10. a document multiple digital watermarking extraction element, it is characterised in that: include detection module (6), extraction module (7), meter Calculate module (8) and matching module (9)；

Described detection module (6), for detecting in pending document whether embed watermark information, if it is, all characters are pressed Rule is divided into two-layer, proceeds to extraction module (7), and otherwise, end processes；

Described extraction module (7), for extracting watermark information, at the attribute of the document second layer in the attribute position of document ground floor Extract watermark information in Wei, obtain the actual extracting group number of every layer of watermark information extracted respectively according to separator；

Described computing module (8), for the character sum according to document ground floor, extracts in the ground floor and the second layer of document The length of the watermark information position gone out, respectively obtains the predetermined extraction group number of the watermark information embedding document ground floor and the second layer；

Described matching module (9), records, often for unanimously and all matching a data storehouse when the many groups watermark information extracted When the actual extracting group number of layer is the most equal with predetermined extraction group number, the most all watermark informations are normal, and document is not attacked, and looks into Original watermark information is exported after asking data base；Otherwise, error correct is carried out.