CN101119429A

CN101119429A - A method and device for digital watermark embedding and extraction

Info

Publication number: CN101119429A
Application number: CNA2006100890668A
Authority: CN
Inventors: 亓文法; 熊怀欣; 李晓龙; 杨斌; 王立东
Original assignee: Peking University; Beijing Founder Electronics Co Ltd
Current assignee: Peking University; Beijing Founder Electronics Co Ltd
Priority date: 2006-08-01
Filing date: 2006-08-01
Publication date: 2008-02-06

Abstract

The invention relates to a method and device for embedding and extracting digital watermarks. The method solves the problem in the prior art that the method of embedding and extracting the digital watermark in the binary text image is not universal. The embedding method of the digital watermark includes: obtaining the watermark information bit string to be embedded; searching for a closed connected region composed of characters, and obtaining the outline code chain of the connected region; according to the number of black dots in the connected region, the Calculate the first number of pixels that need to be flipped through the watermark information bit string and the length of the first step; flip the pixels along the outline code chain according to the first number. The embedding and extraction process of the method of the present invention is more concise, does not need to be divided into blocks and scrambled in advance, and does not depend on the typesetting format of the content of the text document, which not only improves the operation efficiency, but also can effectively resist the attack of the printing and scanning process. Thus, information tracking can be carried out conveniently.

Description

A method and device for digital watermark embedding and extraction

技术领域technical field

本发明涉及一种数字水印嵌入与提取的方法及装置，特别涉及一种在黑白二值文本图像中数字水印嵌入与提取的方法及装置。The invention relates to a method and device for embedding and extracting digital watermarks, in particular to a method and device for embedding and extracting digital watermarks in black and white binary text images.

背景技术Background technique

随着计算机网络和多媒体系统的快速发展，数字媒体(数字音频、数字图像、数字视频)被广泛应用，数字媒体的版权保护和完整性保护也成为一个亟待解决的问题。而其中的文本文档不仅以数字格式存在电脑中，它还能通过打印、扫描、复印等方法以纸张的形式传播。实际上许多纸张文档(如契约、票据等)比那些音频、视频或者图像之类的多媒体更有价值。而随着计算机、打印机和扫描仪等设备的应用和普及，使得拷贝和复制技术变得相对比较容易，因而对重要文本文档的版权保护工作显得尤为迫切。数字水印技术是解决该问题的有效方法之一。With the rapid development of computer networks and multimedia systems, digital media (digital audio, digital image, digital video) are widely used, and the copyright protection and integrity protection of digital media has become an urgent problem to be solved. The text documents among them are not only stored in the computer in digital format, but can also be transmitted in the form of paper by printing, scanning, copying and other methods. In fact many paper documents (such as contracts, bills, etc.) are more valuable than those multimedia such as audio, video or images. With the application and popularization of equipment such as computers, printers and scanners, copying and reproduction technology has become relatively easy, so the copyright protection of important text documents is particularly urgent. Digital watermarking technology is one of the effective methods to solve this problem.

数字水印是将与数字媒体内容相关或不相关的一些标示信息直接嵌入到数字媒体内容当中，但不影响原内容的信息，并不能被人的知觉系统觉察或注意到。通过这些隐蔽在数字媒体内容中的信息，可以达到确认内容创建者、购买者、内容是否真实完整以及其他设定目的。数字水印技术为版权鉴定提供了一种有效的途径，它不但可以应用于图像、文本、音频、视频和三维图形等数字产品的版权保护、内容鉴定和标识隐藏等，还可以应用于印刷品中。Digital watermarking is to embed some marking information related or not related to the digital media content directly into the digital media content, but it does not affect the information of the original content, and cannot be perceived or noticed by the human perception system. Through the information hidden in digital media content, it is possible to confirm content creators, buyers, whether the content is authentic and complete, and other setting purposes. Digital watermark technology provides an effective way for copyright identification. It can not only be applied to copyright protection, content identification and logo hiding of digital products such as images, texts, audio, video and three-dimensional graphics, but also can be applied to printed materials.

文本文档图像可以看作是二值的数字图像，与灰度图像具有丰富的灰度级不同，二值图像只有黑白两种颜色的像素，这个特点决定了对像素的任意修改都会造成视觉上的明显变化。例如在全黑或全白的区域翻转任意像素，在视觉上造成的影响的都是不可接受的。对于二值图像来说，所能利用的视觉冗余都是黑白区域的交接处，即图像的边界点。所以二值图像的水印方法都集中在对边界点的修改，不能孤立考虑一个像素，而应该考虑像素点的领域状况。二值图像尤其是二值文本图像广泛应用在书籍和报刊中，由于经常要进行印刷处理，所以水印抗打印扫描攻击的鲁棒性显得尤为重要。Text document images can be regarded as binary digital images, which are different from grayscale images with rich gray levels. Binary images have only black and white pixels. This feature determines that any modification of pixels will cause visual damage. obvious change. For example, flipping any pixel in an all-black or all-white area is visually unacceptable. For binary images, the visual redundancy that can be used is the intersection of black and white areas, that is, the boundary points of the image. Therefore, the watermarking methods of binary images are all focused on the modification of the boundary points. One pixel cannot be considered in isolation, but the domain status of the pixel point should be considered. Binary images, especially binary text images, are widely used in books and newspapers. Because they are often printed, the robustness of watermark against printing and scanning attacks is particularly important.

在现有的方法中，二值图像水印方法可以分为整体图像特征修改法和局部图像特征修改法两类。其中整体图像特征修改法是利用大块图像或大块图像之间的几何特征来隐藏信息，常见的方法包括文本行间距平移、文本字间距平移和字符结构微调等方法。但是这种方法藏入的信息量较少，并且嵌入的方式比较复杂；局部图像特征修改法是对分块图像的统计特征进行修改来隐藏信息，通常有奇偶嵌入法、步长奇偶法、比例修改法、游程修改法、边界修改法和字符特征修改法等。它们的共同特征都是通过对边界像素的修改，改变局部图像的统计特征，以达到信息隐藏的目的。但是目前的局部图像特征修改法的不足在于：只适用于数字图像中水印信息的嵌入和提取，抗打印扫描攻击的能力明显不足。另外在选择像数点改变时还需要将某区域内的图像进行置乱处理，然后分块统计黑点个数，并根据一定的规则和待嵌入的位串信息对图像的特定象素点进行修改。但是对于一般的文本文档来说，排版格式的差异使得无法准确定位区域，信息提取的困难较大。Among the existing methods, the binary image watermarking method can be divided into two categories: the overall image feature modification method and the local image feature modification method. Among them, the overall image feature modification method uses the geometric features of large images or between large images to hide information. Common methods include text line spacing translation, text word spacing translation, and character structure fine-tuning. However, this method hides less information, and the embedding method is more complicated; the local image feature modification method is to modify the statistical characteristics of the block image to hide information, usually there are parity embedding method, step parity method, ratio Modification method, run length modification method, boundary modification method and character feature modification method, etc. Their common feature is to change the statistical characteristics of local images by modifying the boundary pixels to achieve the purpose of information hiding. However, the disadvantage of the current local image feature modification method is that it is only suitable for embedding and extracting watermark information in digital images, and the ability to resist printing and scanning attacks is obviously insufficient. In addition, it is necessary to scramble the image in a certain area when selecting the number of pixels to change, and then count the number of black dots in blocks, and perform specific pixel points of the image according to certain rules and bit string information to be embedded. Revise. However, for general text documents, the difference in typesetting format makes it impossible to accurately locate the region, and it is more difficult to extract information.

其中公开号为CN 1567353A专利文献中提出的方法就是上述局部图像特征修改法中的一种，公开号为CN 1567353A的专利“一种在二值图像上嵌入水印的方法”，包括以下步骤：a、对二值图像提取边缘；b、对边缘点进行分析，计算出可修改像素的优先级；c、将二值图像置乱；d、对水印信号进行卷积编码；e、在置乱的二值图像中嵌入水印图像。“一种提取水印的方法”，包括以下步骤：a′、将嵌入水印的二值图像进行置乱并分块；b′、对每一子块提取一位水印信息；c′、对水印矩阵进行Viterbi译码，得到要提取的水印。但该专利方法同样对打印扫描过程不具备鲁棒性。Wherein the publication number is that the method proposed in the CN 1567353A patent document is exactly one of the above-mentioned local image feature modification methods, and the publication number is CN 1567353A patent "a method for embedding a watermark on a binary image", comprising the following steps: a , extract the edge from the binary image; b, analyze the edge points, and calculate the priority of the modifiable pixel; c, scramble the binary image; d, perform convolutional coding on the watermark signal; e, in the scrambling The watermark image is embedded in the binary image. "A method for extracting a watermark" includes the following steps: a', scrambling the binary image embedded with the watermark and dividing it into blocks; b', extracting one bit of watermark information for each sub-block; c', for the watermark matrix Perform Viterbi decoding to obtain the watermark to be extracted. However, this patented method is also not robust to the printing and scanning process.

发明内容Contents of the invention

本发明提供一种数字水印嵌入与提取的方法及装置，用以解决现有技术中存在的在二值文本图像中嵌入和提取数字水印方式不具备通用性的问题，进一步的用于解决水印提取过程对打印扫描攻击鲁棒性不强的问题，还用于解决嵌入水印后的图像视觉效果不佳的问题。The present invention provides a method and device for embedding and extracting digital watermarks, which are used to solve the problem in the prior art that the methods of embedding and extracting digital watermarks in binary text images do not have universality, and are further used to solve the problem of watermark extraction The process is not robust to printing and scanning attacks, and it is also used to solve the problem of poor visual effects of images embedded with watermarks.

本发明提供了一种数字水印的嵌入方法，包括如下步骤：The invention provides a method for embedding a digital watermark, comprising the following steps:

获取待嵌入的水印信息位串；Obtain the watermark information bit string to be embedded;

搜索由字符组成的封闭连通区域，获取所述连通区域的轮廓码链；Search for a closed connected region composed of characters, and obtain the contour code chain of the connected region;

根据所述连通区域内的黑点个数、所述水印信息位串、以及第一步长计算出需要翻转的像素第一个数；Calculate the first number of pixels that need to be flipped according to the number of black dots in the connected area, the watermark information bit string, and the length of the first step;

沿所述轮廓码链按所述第一个数翻转像素。Flip pixels by the first number along the chain of contour codes.

较佳地，所述获取待嵌入的水印信息位串步骤中，对所述水印信息位串进行加密处理。Preferably, in the step of obtaining the watermark information bit string to be embedded, encryption processing is performed on the watermark information bit string.

较佳地，所述搜索由字符组成的封闭连通区域，获取所述连通区域的轮廓码链步骤，包括：Preferably, the step of searching for a closed connected region composed of characters, and obtaining the contour code chain of the connected region includes:

根据八邻域内颜色点的不同判断字符的边界点并标记；According to the difference of the color points in the eight neighborhoods, the boundary points of the characters are judged and marked;

按第一顺序寻找第一个未被遍历过的边界点作为轮廓起始点，并记录进入所述起始点的方向以及码链信息；Find the first boundary point that has not been traversed as the starting point of the contour in the first order, and record the direction of entering the starting point and the code chain information;

根据上一步骤所述进入方向按第二顺序寻找下一个边界点作为轮廓点，并记录进入所述轮廓点的方向以及码链信息直至回到起始点；According to the entry direction described in the previous step, search for the next boundary point as the contour point in the second order, and record the direction of entering the contour point and the code chain information until returning to the starting point;

根据记录的所有所述码链信息得到所述连通区域的轮廓码链。A contour code chain of the connected region is obtained according to all the recorded code chain information.

较佳地，所述第一步长是步长奇偶法或奇偶嵌入法的步长。Preferably, the first step size is the step size of the step parity method or the parity embedding method.

较佳地，所述第一个数按下述计算公式得出：Preferably, the first number is obtained according to the following calculation formula:

第一个数＝n+原黑点个数，其中The first number=n+the number of original black dots, where

$\{\begin{matrix} n no = = 00,, m m &Element; &Element; ((0,600 0,600)) \cup \cup ((17001700,, + + \infty \infty)) \\ n no = = m m - - 600600,, m m &Element; &Element; [[600,750 600,750)) \\ n no = = Q Q - - m m % % Q Q,, m m &Element; &Element; {{m m | | m m % % Q Q > > Q Q \times \times 22 / / 33,, m m / / Q Q = = w w,, m m &GreaterEqual; &Greater Equal; 600600,, m m \leq \leq 17001700}} \\ n no = = Q Q \times \times 55 / / 33 - - m m % % Q Q,, m m &Element; &Element; {{m m | | m m % % Q Q > > Q Q \times \times 22 / / 33,, m m / / Q Q &NotEqual; &NotEqual; w w,, m m &GreaterEqual; &Greater Equal; 600600,, m m \leq \leq 17001700}} \\ n no = = m m % % Q Q,, m m &Element; &Element; {{m m | | m m % % Q Q > > Q Q \times \times 22 / / 33,, m m / / Q Q = = w w,, m m &GreaterEqual; &Greater Equal; 600600,, m m \leq \leq 17001700}} \\ n no = = Q Q \times \times 22 / / 33 - - m m % % Q Q,, m m &Element; &Element; {{m m | | m m % % Q Q > > Q Q \times \times 22 / / 33,, m m / / Q Q = = 22,, m m &GreaterEqual; &Greater Equal; 600600,, m m \leq \leq 17001700}} \end{matrix}$

上式中m为黑点个数，n为需要增加的点个数，当n大于0时，增加黑点的个数，将白色像素翻转为黑色像素，当n小于0时，减少黑点的个数，将黑色像素翻转为白色像素，w为待嵌入的水印信息位串，Q为第一步长。In the above formula, m is the number of black dots, and n is the number of dots to be added. When n is greater than 0, increase the number of black dots and turn white pixels into black pixels. When n is less than 0, reduce the number of black dots. number, flip black pixels to white pixels, w is the watermark information bit string to be embedded, and Q is the length of the first step.

较佳地，所述沿所述轮廓码链按所述第一个数翻转像素步骤中，对字符外包矩形上的白点不做翻转。Preferably, in the step of flipping pixels according to the first number along the outline code chain, the white dots on the rectangle surrounding the characters are not flipped.

本发明还提供了一种数字水印的提取方法，包括如下步骤：The present invention also provides a method for extracting a digital watermark, comprising the following steps:

将文本文档扫描后进行处理得到单个字符图像区域；Scan the text document and process it to obtain a single character image area;

搜索由字符组成的封闭连通区域，获取所述连通区域的轮廓码链，统计所述连通区域内的黑点个数；Search for a closed connected region composed of characters, obtain the contour code chain of the connected region, and count the number of black dots in the connected region;

根据所述连通区域内的黑点个数以及第一步长提取出所述水印信息位串。The watermark information bit string is extracted according to the number of black dots in the connected area and the first step length.

较佳地，所述将文本文档扫描后进行处理步骤中，是将文本文档扫描后采用区域方法或边界方法或边缘方法进行图像分割处理。Preferably, in the step of scanning the text document and performing the processing, the text document is scanned and the image segmentation process is performed by using the region method, the boundary method or the edge method.

较佳地，所述搜索由字符组成的封闭连通区域，获取所述连通区域的轮廓码链，统计所述连通区域内的黑点个数步骤，包括：Preferably, the steps of searching for a closed connected area composed of characters, obtaining the contour code chain of the connected area, and counting the number of black dots in the connected area include:

根据记录的所有所述码链信息得到所述连通区域的轮廓码链；Obtaining the contour code chain of the connected region according to all the recorded code chain information;

统计所述连通区域内的黑点个数。Count the number of black dots in the connected area.

较佳地，所述根据所述连通区域内的黑点个数以及第一步长提取出所述水印信息位串步骤中，所述水印信息位串按以下公式得出：Preferably, in the step of extracting the watermark information bit string according to the number of black dots in the connected region and the first step length, the watermark information bit string is obtained according to the following formula:

$\{\begin{matrix} w w = = 00,, m m &Element; &Element; {{m m | | [[m m / / Q Q + + 0.5 0.5]] % % = = 00,, m m &GreaterEqual; &Greater Equal; 750750,, m m \leq \leq 17001700}} \\ w w = = 11,, m m &Element; &Element; {{m m | | [[m m / / Q Q + + 0.5 0.5]] % % 22 = = 11,, m m &GreaterEqual; &Greater Equal; 750750,, m m \leq \leq 17001700}} \end{matrix}$

其中，m为黑点个数，w为水印信息位串，Q为第一步长。Among them, m is the number of black dots, w is the watermark information bit string, and Q is the length of the first step.

较佳地，进一步包括如下步骤：Preferably, further comprising the following steps:

对所述提取的水印信息位串进行解密处理。Decryption is performed on the extracted watermark information bit string.

本发明还进一步提供了一种数字水印的嵌入装置，包括：The present invention further provides a digital watermark embedding device, including:

水印信息获取模块，用于获取待嵌入的水印信息位串；The watermark information acquisition module is used to acquire the watermark information bit string to be embedded;

轮廓码链获取模块，用于搜索由字符组成的封闭连通区域，获取所述连通区域的轮廓码链；A contour code chain acquisition module, configured to search for a closed connected region composed of characters, and obtain the contour code chain of the connected region;

黑点个数统计模块，与所述轮廓码链获取模块相连，用于统计所述连通区域内的黑点个数；A counting module for the number of black dots is connected to the acquisition module of the contour code chain, and is used to count the number of black dots in the connected area;

第一个数计算模块，与所述水印信息获取模块、所述黑点个数统计模块相连，用于根据所述黑点个数、所述水印信息位串、以及第一步长计算出需要翻转的像素第一个数；The first number calculation module is connected with the watermark information acquisition module and the black dot number statistics module, and is used to calculate the required number according to the black dot number, the watermark information bit string, and the first step length The first number of pixels to flip;

像素翻转模块，与所述第一个数计算模块、所述轮廓码链获取模块相连，用于沿所述轮廓码链按所述第一个数翻转像素。The pixel flipping module is connected with the first number calculation module and the contour code chain acquisition module, and is used to flip pixels according to the first number along the contour code chain.

较佳地，进一步包括加密模块，与所述水印信息获取模块相连，用于对所述水印信息位串进行加密处理。Preferably, it further includes an encryption module, connected to the watermark information acquisition module, for performing encryption processing on the watermark information bit string.

较佳地，所述轮廓码链获取模块包括：Preferably, the contour code chain acquisition module includes:

第一边界点标记单元，用于根据八邻域内颜色点的不同判断字符的边界点并标记；The first boundary point marking unit is used to judge the boundary points of characters according to the difference of color points in the eight neighborhoods and mark them;

第一码链遍历单元，用于根据所述第一边界点标记单元标记的边界点，按第一顺序寻找第一个未被遍历过的边界点作为轮廓起始点，并记录进入所述起始点的方向以及码链信息，根据所述进入方向按第二顺序寻找下一个边界点作为轮廓点，并记录进入所述轮廓点的方向以及码链信息直至回到起始点；The first code chain traversal unit is used to search for the first boundary point that has not been traversed as the starting point of the outline in the first order according to the boundary points marked by the first boundary point marking unit, and record the starting point direction and code chain information, according to the entry direction to find the next boundary point in the second order as the contour point, and record the direction of entering the contour point and the code chain information until returning to the starting point;

第一码链形成单元，用于根据所述第一码链遍历单元记录的所有所述码链信息得到所述连通区域的轮廓码链。The first code chain forming unit is configured to obtain the contour code chain of the connected region according to all the code chain information recorded by the first code chain traversal unit.

较佳地，所述第一个数计算模块包括：Preferably, the first number calculation module includes:

步长确定单元，用于根据步长奇偶法或奇偶嵌入法的步长确定第一步长；A step size determination unit is used to determine the first step size according to the step size of the step size parity method or the parity embedding method;

个数计算单元，用于按下述计算公式得出所述第一个数，A number calculation unit, used to obtain the first number according to the following calculation formula,

其中：第一个数＝n+原黑点个数，m为黑点个数，n为需要增加的点个数，当n大于0时，增加黑点的个数，将白色像素翻转为黑色像素，当n小于0时，减少黑点的个数，将黑色像素翻转为白色像素，w为待嵌入的水印信息位串，Q为第一步长。Among them: the first number=n+the number of original black dots, m is the number of black dots, n is the number of dots to be added, when n is greater than 0, increase the number of black dots, and turn white pixels into black pixels , when n is less than 0, reduce the number of black dots, turn black pixels into white pixels, w is the watermark information bit string to be embedded, and Q is the length of the first step.

本发明进一步的又提供了一种数字水印的提取装置，包括：The present invention further provides a digital watermark extraction device, including:

图像分割处理模块：用于将文本文档扫描后进行处理得到单个字符图像区域；Image segmentation processing module: used to process the scanned text document to obtain a single character image area;

字符黑点个数统计模块：与所述图像分割处理模块相连，用于搜索由字符组成的封闭连通区域，获取所述连通区域的轮廓码链，统计所述连通区域内的黑点个数；Character black dot number statistics module: connected to the image segmentation processing module, used to search for a closed connected area composed of characters, obtain the outline code chain of the connected area, and count the number of black dots in the connected area;

水印信息提取模块：与所述字符黑点个数统计模块相连，用于根据所述连通区域内的黑点个数以及第一步长提取出所述水印信息位串。Watermark information extraction module: connected to the character black dot number statistics module, used to extract the watermark information bit string according to the number of black dots in the connected area and the first step length.

较佳地，所述字符黑点个数统计模块包括：Preferably, the character black dot number statistics module includes:

第二边界点标记单元，用于根据八邻域内颜色点的不同判断字符的边界点并标记；The second boundary point marking unit is used to judge the boundary points of characters according to the difference of color points in the eight neighborhoods and mark them;

第二码链遍历单元，用于根据所述第二边界点标记单元标记的边界点，按第一顺序寻找第一个未被遍历过的边界点作为轮廓起始点，并记录进入所述起始点的方向以及码链信息，根据所述进入方向按第二顺序寻找下一个边界点作为轮廓点，并记录进入所述轮廓点的方向以及码链信息直至回到起始点；The second code chain traversal unit is used to search for the first boundary point that has not been traversed as the starting point of the outline in the first order according to the boundary points marked by the second boundary point marking unit, and record the starting point direction and code chain information, according to the entry direction to find the next boundary point in the second order as the contour point, and record the direction of entering the contour point and the code chain information until returning to the starting point;

第二码链形成单元，用于根据所述第二码链遍历单元记录的所有所述码链信息得到所述连通区域的轮廓码链；a second code chain forming unit, configured to obtain the contour code chain of the connected region according to all the code chain information recorded by the second code chain traversal unit;

黑点个数统计单元，用于统计所述连通区域内的黑点个数。The counting unit for the number of black points is used for counting the number of black points in the connected area.

较佳地，所述水印提取模块按以下公式提取水印信息位串：Preferably, the watermark extraction module extracts the watermark information bit string according to the following formula:

较佳地，进一步包括解密模块，与水印提取模块相连，用于对所述提取的水印信息位串进行解密处理。Preferably, it further includes a decryption module, connected to the watermark extraction module, for decrypting the extracted watermark information bit string.

本发明有益效果如下：The beneficial effects of the present invention are as follows:

由于本发明是根据连通区域进行黑点个数的变化来进行翻转从而达到数字水印嵌入的目的，而文本都具有连通区域，因此本发明对一般的二值文本图像具有通用性，可以应用在包含英文、数字符号、汉字以及日文等任何字符集的文本文档；Since the present invention performs flipping according to the change of the number of black dots in the connected area so as to achieve the purpose of digital watermark embedding, and all texts have connected areas, the present invention has universality for general binary text images and can be applied to applications including Text documents in any character set such as English, number symbols, Chinese characters, and Japanese;

由于本发明是通过步长与增加黑点个数的关系来控制嵌入的，因此本发明对打印扫描过程具有很强的鲁棒性，能够应用于传统印刷以及打印扫描的文本文档；Because the present invention controls the embedding through the relationship between the step length and the number of black dots, the present invention has strong robustness to the printing and scanning process, and can be applied to traditional printing and printing and scanning text documents;

与现有技术相比，本发明的嵌入和提取过程更加简洁，不需要事先分块和置乱处理，并且不依赖于文本文档内容的排版格式，不仅提高了运算效率，而且对于打印扫描攻击的影响，也可以方便地进行跟踪；Compared with the prior art, the embedding and extraction process of the present invention is more concise, does not need to be divided into blocks and scrambled in advance, and does not depend on the typesetting format of the content of the text document, which not only improves the computing efficiency, but also prevents printing and scanning attacks impact, which can also be easily tracked;

由于本发明只与文本文档中的字符所占的面积大小有关，因此使得抗打印扫描过程中平移、旋转等几何攻击的能力也得到增强；Since the present invention is only related to the size of the area occupied by the characters in the text document, the ability to resist geometric attacks such as translation and rotation during printing and scanning is also enhanced;

由于本发明中可以对嵌入的信息作各种处理，比如压缩、加密、加校验位等，所以本发明不仅使得嵌入的信息灵活、而且嵌入的信息量较大、便于实现各种保密措施，同时提取的精确度较已有的方法有大幅度提高。Since the embedded information can be processed in various ways in the present invention, such as compression, encryption, check digit addition, etc., the present invention not only makes the embedded information flexible, but also has a large amount of embedded information, which facilitates the realization of various security measures. At the same time, the extraction accuracy is greatly improved compared with the existing methods.

附图说明Description of drawings

图1为本发明数字水印嵌入方法的主流程图；Fig. 1 is the main flowchart of the digital watermark embedding method of the present invention;

图2为本发明获得连通区域封闭轮廓码链的流程图；Fig. 2 is the flowchart of obtaining the closed contour code chain of connected regions in the present invention;

图3为大写的英文字符S嵌入水印信息0、1和未嵌入任何信息的三种情况比较示意图；Fig. 3 is a schematic diagram of the comparison of three cases where the uppercase English character S is embedded with watermark information 0, 1 and not embedded with any information;

图4是原始文本文档图像示意图；Fig. 4 is a schematic diagram of an original text document image;

图5是针对图4的原始文本嵌入水印信息后的文本文档图像示意图；Fig. 5 is a schematic diagram of a text document image after embedding watermark information for the original text of Fig. 4;

图6为提取按照本发明方法嵌入水印信息的流程图；Fig. 6 is the flowchart of extracting embedded watermark information according to the method of the present invention;

图7是本发明水印嵌入装置的结构示意图；Fig. 7 is a schematic structural diagram of the watermark embedding device of the present invention;

图8是本发明提取水印信息的装置结构示意图；Fig. 8 is a schematic structural diagram of a device for extracting watermark information according to the present invention;

图9为本发明的一个具体实施例的水印嵌入及提取的流程示意图。Fig. 9 is a schematic flow chart of watermark embedding and extraction according to a specific embodiment of the present invention.

具体实施方式Detailed ways

本发明的构思在于，通过获取封闭连通区域的轮廓码链后，利用连通区域内的黑点个数、通过选定的步长将水印信息嵌入文本图像中，从而得到了一种数字水印的嵌入方法。下面结合附图对本发明的具体实施作出说明。The idea of the present invention is to embed the watermark information into the text image by using the number of black dots in the connected region after obtaining the contour code chain of the closed connected region, so as to obtain a digital watermark embedding method. The specific implementation of the present invention will be described below in conjunction with the accompanying drawings.

如图1所示，是本发明数字水印嵌入方案的主流程图，包括以下步骤：As shown in Figure 1, it is the main flow chart of the digital watermark embedding scheme of the present invention, comprising the following steps:

S101、获取待嵌入的水印信息位串，并对水印信息进行预处理；S101. Obtain a watermark information bit string to be embedded, and preprocess the watermark information;

该步骤中，首先获取需要嵌入的水印信息位串，该信息可以为数字图像、文本、音频和视频等任何可以数字化的数据。采用压缩算法对水印信息位串进行压缩处理，以增大嵌入的信息量，实施例中此处采用通用的LZW压缩算法以作说明，但不限于仅用LZW压缩算法；为了安全性考虑，在优选实施中还可以对压缩后的水印信息位串进行加密处理；在加密后的暗文字串后面插入必要的数据正确性校验数据位，以便提取信息时进行数据正确性验证，从而提高信息提取的准确度。In this step, first obtain the watermark information bit string to be embedded, and the information can be any digital data such as digital image, text, audio and video. The compression algorithm is used to compress the watermark information bit string to increase the amount of embedded information. In the embodiment, the general LZW compression algorithm is used here for illustration, but it is not limited to only using the LZW compression algorithm; for security considerations, in In the preferred implementation, the compressed watermark information bit string can also be encrypted; the necessary data correctness check data bits are inserted behind the encrypted cipher text string, so that the data correctness verification can be carried out when extracting information, thereby improving information extraction. the accuracy.

S102、依次搜索版面中由字符组成的封闭连通区域，并提取字符轮廓信息，获得字符连通区域的封闭轮廓码链；S102, sequentially searching for closed connected regions composed of characters in the layout, and extracting character outline information to obtain a closed contour code chain of character connected regions;

该步骤中对每个连通区域内的字符图像，寻找字符的连续封闭轮廓，用一串码链表示。码链记录了轮廓点的位置、相对上一个轮廓点的方向，以及连通性标志。In this step, for the character images in each connected area, the continuous closed contours of the characters are searched, which are represented by a series of code chains. The code chain records the position of the contour point, the direction relative to the previous contour point, and the connectivity flag.

寻找连续封闭轮廓的方法如下：The method for finding continuous closed contours is as follows:

首先对所获得的字符点阵分配一个标志域，并标识每个像素点是否为字符图像的边界点，具体过程为：选择一个黑色像素点为中心，判断其周围八邻域内是否有白色像素点存在，如果存在，则标记为1；否则标记为0。这里1表示该像素点为字符图像的边界点，0则表示不是边界点。First, assign a flag field to the obtained character lattice, and identify whether each pixel is a boundary point of the character image. The specific process is: select a black pixel as the center, and judge whether there are white pixels in the surrounding eight neighborhoods exists, if present, it is marked as 1; otherwise, it is marked as 0. Here, 1 indicates that the pixel point is a boundary point of the character image, and 0 indicates that it is not a boundary point.

遍历字符轮廓的方法分为下面两个步骤。The method of traversing character outlines is divided into the following two steps.

第一步、寻找遍历轮廓的起始点，按第一顺序扫描字符点阵，具体实施中我们定第一顺序为从上到下、从左到右，找到第一个未被遍历过的边界点，作为轮廓起始点，并记录进入该点的进入方向，具体实施中我们假定所述进入方向为从左到右。The first step is to find the starting point of traversing the outline, and scan the character lattice in the first order. In the specific implementation, we set the first order to be from top to bottom, from left to right, and find the first boundary point that has not been traversed. , as the starting point of the contour, and record the entry direction of this point. In the specific implementation, we assume that the entry direction is from left to right.

第二步、以上一个进入方向为基准，按第二顺序不断寻找到下一个需要遍历的轮廓点，具体实施中我们定第二顺序按“最先左看”的原则执行，即以进入该轮廓点的方向为基准，按左、上、右、下的顺时针顺序试探当前轮廓点的邻接点，如果该邻接点是边界点则作为下一个轮廓点，并记录进入该点的方向，否则继续试探。如果回到轮廓的起始点，则表明一个封闭轮廓遍历完毕。The second step is to use the previous entry direction as the reference, and continuously search for the next contour point that needs to be traversed according to the second order. In the specific implementation, we set the second order and implement it according to the principle of "first look to the left", that is, to enter the contour The direction of the point is the reference, and the adjacent points of the current contour point are tested in the clockwise order of left, top, right, and bottom. If the adjacent point is a boundary point, it will be used as the next contour point, and the direction of entering this point will be recorded, otherwise continue test. If it returns to the starting point of the contour, it indicates that a closed contour has been traversed.

重复上面两个步骤，并在遍历的同时记录码链信息，便可以得到字符所有封闭轮廓的码链。Repeat the above two steps, and record the code chain information while traversing, then you can get the code chains of all closed outlines of characters.

其中，第一顺序与第二顺序的设置目的是使得遍历时顺次得到每一个边界点的信息，从而保证得到的轮廓码链是完整的。基于这个原理，第一顺序显然也可采用为从下到上、再从右到左等等顺序；而第二顺序当然也可按“最先上、或右、或下看”的原则执行。Among them, the purpose of setting the first order and the second order is to obtain the information of each boundary point sequentially during traversal, so as to ensure that the obtained contour code chain is complete. Based on this principle, the first order can obviously also be adopted as a sequence from bottom to top, then from right to left, etc.; and the second order can of course also be executed according to the principle of "first look up, or right, or look down".

图2是获得连通区域封闭轮廓码链的流程图。如图所示，获得轮廓码链的方法包括如下步骤：Fig. 2 is a flow chart of obtaining closed contour code chains of connected regions. As shown in the figure, the method for obtaining the contour code chain includes the following steps:

S201、首先获取需要进行数字水印嵌入的字符图像；S201. First, acquire a character image that needs to be embedded with a digital watermark;

S202、定位图像中字符的外包矩形；S202, locating the surrounding rectangle of the characters in the image;

具体的定位方法如下：首先将各字符图像区域内的黑点个数向水平方向投影，根据黑点分布的左右边界起始位置确定各字符图像区域的外接矩形的左边界和右边界；然后再将字符图像区域的黑点个数向竖直方向投影，并根据黑点分布的上下边界起始位置确定各字符图像区域的外接矩形的上边界和下边界，从而得到完整的外接矩形区域。The specific positioning method is as follows: firstly, the number of black dots in each character image area is projected to the horizontal direction, and the left and right boundaries of the circumscribed rectangle of each character image area are determined according to the starting positions of the left and right boundaries of the black dot distribution; and then Project the number of black dots in the character image area to the vertical direction, and determine the upper boundary and lower boundary of the circumscribed rectangle of each character image area according to the starting positions of the upper and lower boundaries of the black dot distribution, so as to obtain a complete circumscribed rectangle area.

S203、标记所有的边界点；S203. Mark all boundary points;

S204、扫描是否能获得未被遍历的边界点，若没有转到S212，若有转到S205；S204, scan whether to obtain boundary points that have not been traversed, if not, go to S212, if yes, go to S205;

S205、标记该点为轮廓起始点，并设为当前点，同时设定进入该点的方向为从左到右；S205, mark this point as the starting point of the contour, and set it as the current point, and set the direction of entering this point as from left to right;

S206、以进入该点的方向为准按当前点的左上右下的顺序依次试探当前点各方向上的点位；S206. Based on the direction of entering the point, test the points in each direction of the current point in sequence according to the order of the upper left and the lower right of the current point;

S207、判断下一遍历点是否为边界点，若否则转入S206，若是则转入S208；S207. Determine whether the next traversal point is a boundary point, if not, turn to S206, and if so, turn to S208;

S208、设置下一遍历点为当前点；S208. Set the next traversal point as the current point;

S209、记录进入当前点的方向，并标记当前点已经遍历过；S209. Record the direction of entering the current point, and mark that the current point has been traversed;

S210、记录当前点的码链信息；S210. Record the code chain information at the current point;

S211、判断当前点是否为轮廓起始点，若否则转入S206，若是则转入S204；S211. Determine whether the current point is the starting point of the contour, if not, go to S206, and if so, go to S204;

S212、根据所记录的每一点的码链信息获得轮廓码链。每一个点的码链信息串起来后就成了完整的一个封闭连通区域的“轮廓码链”。S212. Obtain a contour code chain according to the recorded code chain information of each point. After the code chain information of each point is strung together, it becomes a complete "contour code chain" of a closed connected area.

S103、计算该连通区域内的黑点个数，并根据选定的步长判断该连通区域可否用于嵌入水印信息，并计算出需要翻转的象素个数；S103. Calculate the number of black dots in the connected region, and judge whether the connected region can be used to embed watermark information according to the selected step size, and calculate the number of pixels that need to be flipped;

由于外包矩形实际上是单个字符图像区域中包含全部黑色像素点的最小的外接矩形区域。因此在搜索字符图像的轮廓边界前，首先要定位出该字符图像大致的矩形区域，然后统计该区域中所有的黑点个数，即可得到该连通区域内的黑点个数。Because the enclosing rectangle is actually the smallest enclosing rectangle that contains all the black pixels in a single character image area. Therefore, before searching the outline boundary of the character image, firstly locate the roughly rectangular area of the character image, and then count all the black dots in the area to obtain the number of black dots in the connected area.

在判断嵌入水印的0/1数据位时，实施例中我们采用步长奇偶法来给与说明，步长奇偶法是奇偶嵌入法的扩展。用封闭连通区域内的黑色像素点个数相对于一定步长Q的倍数的奇偶性来表示0/1。这样使得算法具有一定的容错能力，经过水印攻击后像素点改变数最大不超过Q/2都可以被有效检测出来。打印扫描过程后得到的图像与原图像看似相同，实际上这个过程综合了多种图像处理过程，图像的像素值及几何位置发生了很大变换，需要水印算法具有很强的鲁棒性才能抵抗这种攻击。此时步长Q的选择变得尤为重要，如果步长过大，需要改变的像素点的个数势必增加，这样会对原文本图像的视觉效果造成很大影响；若步长过小，受打印扫描过程的影响，嵌入后的数据信息特别容易丢失。在本优选实施例中，选择步长为200，一般来说步长的选取不要超过字符图像的边界像素点的个数的两倍。在本实施例中，根据常用字体的字符边界个数的大小，选择了一个较优的数值200来实施。同时，对于黑点数小于750的存在以下几种情况：When judging the 0/1 data bits embedded in the watermark, we use the step parity method to give an explanation in the embodiment, and the step parity method is an extension of the parity embedding method. 0/1 is represented by the parity of the number of black pixels in the closed connected area relative to the multiple of a certain step size Q. This makes the algorithm have a certain fault-tolerant ability, and after the watermark attack, the maximum number of pixel changes is not more than Q/2, which can be effectively detected. The image obtained after the printing and scanning process seems to be the same as the original image. In fact, this process combines a variety of image processing processes. The pixel value and geometric position of the image have undergone great changes, which requires a strong robustness of the watermarking algorithm. resist this attack. At this time, the selection of the step size Q becomes particularly important. If the step size is too large, the number of pixels to be changed will inevitably increase, which will have a great impact on the visual effect of the original text image; if the step size is too small, the affected Due to the impact of the printing and scanning process, the embedded data information is particularly easy to lose. In this preferred embodiment, the selected step size is 200. Generally speaking, the selected step size should not exceed twice the number of border pixels of the character image. In this embodiment, an optimal value of 200 is selected for implementation according to the number of character boundaries of commonly used fonts. At the same time, there are the following situations for the number of black points less than 750:

(a)存在两个连通区域，比如i和j；(a) There are two connected regions, such as i and j;

(b)对其进行轮廓点翻转后会引起较大视觉影响的，比如大写的英文字母I；(b) Flipping the contour points will cause a large visual impact, such as the capitalized English letter I;

(c)可以翻转的轮廓点个数不足200，比如r，t。(c) The number of contour points that can be flipped is less than 200, such as r, t.

这些特性决定了黑点数小于750的这类字符不能嵌入信息。These characteristics determine that such characters with a black point number less than 750 cannot embed information.

对于黑点数大于1700的字符，比如五号Arial字体下的W和M，由于打印扫描后其黑点变换的平均值接近或超过100，200的步长已无法满足其抗打印扫描处理的鲁棒性，所以也不作为可嵌入信息的字符。For characters with more than 1700 black dots, such as W and M under Arial No. 5 font, since the average value of the black dot transformation after printing and scanning is close to or exceeds 100, the step size of 200 cannot meet its robustness against printing and scanning processing. character, so it is not used as a character that can embed information.

为与嵌入信息的字符相区别，将黑点数大于600，且小于750的字符黑点数减到600以下，防止没有嵌入水印的字符被误判为嵌入了水印的字符，信息提取时发生错位，造成信息提取严重失真。In order to distinguish characters with embedded information, the number of black points of characters greater than 600 and less than 750 is reduced to less than 600 to prevent characters without embedded watermarks from being misjudged as characters embedded with watermarks, and misplacement occurs during information extraction, resulting in Information extraction is severely distorted.

为改进该方法的鲁棒性和水印信息的不可见性，对于需增加n个黑点到步长的m倍的情况，可以只增加到m×Q-Q/3个黑点，由于打印扫描处理后黑点数增加，黑点的个数仍在m倍步长±Q/2的识别范围之内。对于黑点数在m×Q-Q/3到m×Q之间的字符，若减少像素到(m-1)×Q则像素变换个数太多，造成较大的视觉影响，同时打印扫描处理后只要增加像素点超过Q/2便会造成误识别，鲁棒性不高。采用增加像素点到(m+2)×Q-Q/3的方法，可以使正确识别的范围扩大到Q/2+Q/3，增强了鲁棒性。In order to improve the robustness of the method and the invisibility of the watermark information, for the case where n black dots need to be increased to m times the step size, it can only be increased to m×Q-Q/3 black dots, because after printing and scanning The number of black dots increases, and the number of black dots is still within the recognition range of m times the step length ±Q/2. For characters whose number of black dots is between m×Q-Q/3 and m×Q, if the number of pixels is reduced to (m-1)×Q, the number of pixel transformations will be too large, causing a large visual impact. At the same time, after printing and scanning, only Increasing pixels beyond Q/2 will cause false recognition, and the robustness is not high. By adopting the method of adding pixels to (m+2)×Q-Q/3, the range of correct recognition can be extended to Q/2+Q/3, which enhances the robustness.

综上所述，步长Q设为200，字符像素点需要翻转个数的计算公式为：To sum up, the step size Q is set to 200, and the calculation formula for the number of character pixels that need to be flipped is:

其中：m为黑点个数；n为需要增加的点个数，当n大于0时，则表示需要增加黑点的个数，即需要将部分白色像素翻转为黑色像素，当n小于0时，则表示减少黑点的个数，即需要将部分黑色像素翻转为白色像素；w为待嵌入的水印信息位。Among them: m is the number of black dots; n is the number of dots that need to be added. When n is greater than 0, it means that the number of black dots needs to be increased, that is, some white pixels need to be turned into black pixels. When n is less than 0 , it means to reduce the number of black dots, that is, some black pixels need to be turned into white pixels; w is the watermark information bit to be embedded.

S104、沿轮廓码链连续的翻转像素，直到达到指定个数，从而嵌入一位信息；S104. Continuously flip pixels along the contour code chain until the specified number is reached, thereby embedding one bit of information;

利用所得轮廓码链可以连续地翻转一组像素，需要增加黑色点时沿轮廓码链方向遍历，翻转黑点八邻域内的白点为黑色，直到翻转了n个像素为止。对字符外包矩形上的白点不做翻转，这样避免了字符大小向外扩展，造成明显变粗，甚至在字符排列紧密时造成两个字符边界粘连，使得提取水印时误判连通区域的范围。A group of pixels can be flipped continuously by using the obtained contour code chain. When it is necessary to add black points, it traverses along the direction of the contour code chain, and flips the white points in the eight neighbors of the black points to black until n pixels are flipped. Do not flip the white dots on the rectangle surrounding the characters, which avoids the outward expansion of the character size, resulting in obvious thickening, and even causes the borders of two characters to stick together when the characters are arranged closely, which makes the range of the connected area misjudged when extracting the watermark.

需要减少黑点时，同样沿轮廓码链方向遍历，翻转轮廓黑点为白色，直到翻转n个像素为止。其中对上下左右均同色的黑点不作翻转，可以起到一定平滑视觉效果的作用。When it is necessary to reduce the black points, it also traverses along the direction of the contour code chain, flipping the black points of the contour to white until n pixels are flipped. Among them, the black dots with the same color at the top, bottom, left and right are not flipped, which can play a role in smoothing the visual effect.

为保证不可见性，优选实施例中可以优先考虑修改字符的内部轮廓，其次修改外部轮廓，在已知轮廓码链的情况下，认为作为独立连通区域的字符图像只有一个外部轮廓，只需要找到第二个轮廓起始点，开始遍历并修改像素点。若只有一个轮廓起始点，即只有外部轮廓，或者内部轮廓的所有点都被修改仍不能满足嵌入需要时，才从第一个轮廓起始点开始遍历。In order to ensure invisibility, in the preferred embodiment, the internal contour of the character can be modified first, and then the external contour is modified. In the case of the known contour code chain, it is considered that the character image as an independent connected region has only one external contour, and only need to find The starting point of the second contour, start traversing and modifying the pixel points. If there is only one starting point of the contour, that is, only the outer contour, or if all points of the inner contour have been modified and still cannot meet the embedding requirements, then start traversing from the first contour starting point.

重复上述S102至S104，嵌入全部的信息位串。Repeat the above S102 to S104 to embed all the information bit strings.

如图3所示，选择Arial字体，设置5号字体，大写英文字母“S”嵌入水印信息位0、1以及不嵌任何信息，其放大后的效果示意图分别为(a)、(b)和(c)所示。由图可以看出，变化前后的视觉平滑度比较理想，不仔细对比查看是不容易发现其中隐藏信息的。As shown in Figure 3, select the Arial font, set the font size 5, and the uppercase English letter "S" is embedded in the watermark information bit 0, 1 and no information is embedded. The schematic diagrams of the enlarged effects are (a), (b) and (c) shown. It can be seen from the figure that the visual smoothness before and after the change is ideal, and it is not easy to find the hidden information without careful comparison and inspection.

在本实施例中选择图4所示的文本文档图像为水印嵌入时的载体图像，遍历所有字符组成的连通区域后，在每个区域中嵌入相应的数据位后的效果示意图如图5所示。In this embodiment, the text document image shown in Figure 4 is selected as the carrier image when the watermark is embedded, and after traversing the connected areas composed of all characters, the effect diagram after embedding corresponding data bits in each area is shown in Figure 5 .

基于以上构思，将数字水印嵌入的方法中的过程相逆，即通过对嵌入水印的文本图像中的黑点个数等将嵌入的水印信息提取，从而将文本图像还原的技术构思，本发明还可得到在二值文本图像中提取数字水印的方法，下面结合附图来对提取方法的具体实施作出说明。图6是提取按照上述方法嵌入水印信息的流程图，从图中可见，包括以下步骤：Based on the above idea, the process in the method of digital watermark embedding is reversed, that is, the technical idea of restoring the text image by extracting the embedded watermark information such as the number of black dots in the embedded watermarked text image, the present invention also A method for extracting a digital watermark in a binary text image can be obtained, and the specific implementation of the extraction method will be described below in conjunction with the accompanying drawings. Fig. 6 is a flowchart of extracting watermark information embedded according to the above method, as can be seen from the figure, including the following steps:

S601、扫描文本文档得到灰度数字图象，并进行图像分割处理；S601. Scan the text document to obtain a grayscale digital image, and perform image segmentation processing;

扫描打印后的文本图像获取电子化后的数字图像数据，由前所述的水印信息嵌入方法，需要统计连通区域内的黑点个数，所以提取水印信息前需要进行数字图像的图像分割处理，得到单个字符图像区域。将文本文档扫描后进行处理中，可以采用区域方法或边界方法或边缘方法来进行处理。Scan the printed text image to obtain digitized digital image data. According to the aforementioned watermark information embedding method, the number of black dots in the connected area needs to be counted. Therefore, image segmentation processing of the digital image is required before extracting the watermark information. Get a single character image area. In the process of scanning the text document, the area method, the boundary method or the edge method can be used for processing.

由于采用步长奇偶法嵌入水印，字符图像经过打印扫描处理后黑点个数的统计值是决定水印能否正确提取的决定因素，由于打印扫描过程中的不确定因素，如果采用固定的阈值对图像字符进行分割，那么就会造成字符图像的边界像素严重失真。因此必须采用阈值化算法动态地确定合适的阈值，尽量准确分割字符图像的边界，才能保证水印提取的正确率。Since the step size parity method is used to embed the watermark, the statistical value of the number of black dots after the character image is printed and scanned is the decisive factor to determine whether the watermark can be extracted correctly. Due to the uncertain factors in the printing and scanning process, if a fixed threshold is used for If the image characters are segmented, the border pixels of the character image will be severely distorted. Therefore, thresholding algorithm must be used to dynamically determine the appropriate threshold and try to segment the boundary of the character image as accurately as possible to ensure the correct rate of watermark extraction.

在本实施例中，对水印进行提取时，使用区域方法中有代表性的阈值化图像分割处理方法，先对扫描后的灰度图像进行大津法阈值化处理，恢复成二值图像。In this embodiment, when extracting the watermark, the representative thresholding image segmentation processing method in the area method is used, and the scanned gray image is firstly subjected to Otsu method thresholding processing, and restored to a binary image.

S602、依次搜索版面中由字符组成的封闭连通区域，并计算区域内所有的黑点个数；S602. Search sequentially for closed connected regions composed of characters in the layout, and calculate the number of all black dots in the region;

遍历图像时，需要先检测出字符图像区域的边缘轮廓，然后统计封闭轮廓区域内的所有黑点个数，其中边缘轮廓检测的方法跟嵌入过程相同。When traversing the image, it is necessary to detect the edge contour of the character image area first, and then count the number of all black dots in the closed contour area. The method of edge contour detection is the same as the embedding process.

S603、根据嵌入规则和每个区域内的黑点个数提取每个区域内嵌入的水印数据位；S603. Extract watermark data bits embedded in each area according to the embedding rule and the number of black dots in each area;

根据S602中计算得到的连通区域的黑点个数m，用下面的公式提取信息位w：According to the number m of black dots in the connected area calculated in S602, use the following formula to extract the information bit w:

所有的信息位提取结束后，可以根据嵌入时的加密信息进行解密，具体实施中可以用以下方式来实现。根据原始的水印信息位串结构，取出相应的校验数据进行信息串的正确性验证。校验通过后的数据再进行最后的解密和解压缩过程，从而还原原始的水印串信息。After all the information bits are extracted, they can be decrypted according to the encrypted information at the time of embedding, which can be implemented in the following manner in specific implementation. According to the original watermark information bit string structure, the corresponding verification data is taken out to verify the correctness of the information string. The data after verification is finally decrypted and decompressed, so as to restore the original watermark string information.

基于以上发明构思，本发明还提供了数字水印嵌入与提取的装置，下面结合附图对所述装置的具体实施作出说明。Based on the above inventive concepts, the present invention also provides a digital watermark embedding and extraction device, and the specific implementation of the device will be described below in conjunction with the accompanying drawings.

图7是本发明水印嵌入装置的结构示意图，如图所示，在数字水印嵌入装置中包括水印信息获取模块701、轮廓码链获取模块702、黑点个数统计模块703、第一个数计算模块704、像素翻转模块705、加密模块706，它们的结构关系是：Fig. 7 is a schematic structural diagram of the watermark embedding device of the present invention, as shown in the figure, the digital watermark embedding device includes a watermark information acquisition module 701, an outline code chain acquisition module 702, a black dot number statistics module 703, and a first number calculation module Module 704, pixel flipping module 705, encryption module 706, their structural relationship is:

第一个数计算模块704分别与水印信息获取模块701、黑点个数统计模块703相连，黑点个数统计模块703还与轮廓码链获取模块702相连，像素翻转模块705分别与轮廓码链获取模块702、第一个数计算模块704相连，优选中水印信息获取模块701还与加密模块706相连。The first number calculation module 704 is connected with the watermark information acquisition module 701 and the black dot number statistics module 703 respectively, the black dot number statistics module 703 is also connected with the contour code chain acquisition module 702, and the pixel flipping module 705 is connected with the contour code chain respectively The acquisition module 702 is connected to the first number calculation module 704, and preferably the watermark information acquisition module 701 is also connected to the encryption module 706.

下面再对这些模块各自所起的作用以及相互之间工作关系说明如下：The functions of these modules and the working relationship between them are explained as follows:

水印信息获取模块获取待嵌入的水印信息位串，信息位串是第一个数计算模块计算的一个依据；The watermark information acquisition module acquires the watermark information bit string to be embedded, and the information bit string is a basis for calculation by the first number calculation module;

轮廓码链获取模块搜索由字符组成的封闭连通区域，获取连通区域的轮廓码链，黑点个数统计模块将根据轮廓码链计算出第一个数计算所需的黑点个数，同时像素翻转模块也是沿轮廓码链进行翻转的；The contour code chain acquisition module searches the closed connected area composed of characters, and obtains the contour code chain of the connected area. The black dot number statistics module will calculate the number of black dots required for the first number calculation according to the contour code chain, and at the same time, the pixel The overturning module is also overturned along the outline code chain;

黑点个数统计模块根据轮廓码链模块得到的连通区域，统计连通区域内的黑点个数；The number of black dots statistics module obtains the connected area according to the contour code chain module, counts the number of black dots in the connected area;

第一个数计算模块根据水印信息获取模块、黑点个数统计模块得到的黑点个数、水印信息位串、再结合第一步长计算出需要翻转的像素第一个数；The first number calculation module calculates the first number of pixels that need to be flipped based on the number of black dots obtained by the watermark information acquisition module and the black dot number statistics module, the watermark information bit string, and the length of the first step;

像素翻转模块在沿所述轮廓码链按第一个数翻转像素。像素翻转模块利用所得轮廓码链可以连续地翻转一组像素，需要增加黑色点时沿轮廓码链方向遍历，翻转黑点八邻域内的白点为黑色，直到翻转了n个像素为止。对字符外包矩形上的白点不做翻转，这样避免了字符大小向外扩展，造成明显变粗，甚至在字符排列紧密时造成两个字符边界粘连，使得提取水印时误判连通区域的范围。需要减少黑点时，同样沿轮廓码链方向遍历，翻转轮廓黑点为白色，直到翻转n个像素为止。其中对上下左右均同色的黑点不作翻转，可以起到一定平滑视觉效果的作用。The pixel flipping module flips pixels according to the first number along the contour code chain. The pixel flipping module can continuously flip a group of pixels by using the obtained contour code chain, traverse along the direction of the contour code chain when it is necessary to add black points, flip the white points in the eight neighborhoods of black points to black, until n pixels are flipped. Do not flip the white dots on the rectangle surrounding the characters, which avoids the outward expansion of the character size, resulting in obvious thickening, and even causes the borders of two characters to stick together when the characters are arranged closely, which makes the range of the connected area misjudged when extracting the watermark. When it is necessary to reduce the black points, it also traverses along the direction of the contour code chain, flipping the black points of the contour to white until n pixels are flipped. Among them, the black dots with the same color at the top, bottom, left and right are not flipped, which can play a role in smoothing the visual effect.

加密模块，用于对所述水印信息位串进行加密处理，加密模块是为了安全性考虑，它可以对压缩后的水印信息位串进行加密处理；在加密后的暗文字串后面插入必要的数据正确性校验数据位，以便提取信息时进行数据正确性验证，从而提高信息提取的准确度。The encryption module is used to encrypt the watermark information bit string. The encryption module is for security considerations. It can encrypt the compressed watermark information bit string; insert necessary data behind the encrypted cipher text string The correctness check data bit is used to verify the correctness of the data when extracting information, thereby improving the accuracy of information extraction.

优选中轮廓码链获取模块702可以包括第一边界点标记单元7021、第一码链遍历单元7022、第一码链形成单元7023。Preferably, the middle contour code chain acquisition module 702 may include a first boundary point marking unit 7021 , a first code chain traversal unit 7022 , and a first code chain formation unit 7023 .

第一边界点标记单元，根据八邻域内颜色点的不同判断字符的边界点并标记；第一码链遍历单元，根据第一边界点标记单元标记的边界点，按第一顺序寻找第一个未被遍历过的边界点作为轮廓起始点，并记录进入所述起始点的方向以及码链信息，根据进入方向按第二顺序寻找下一个边界点作为轮廓点，并记录进入轮廓点的方向以及码链信息直至回到起始点；第一码链形成单元，根据第一码链遍历单元记录的所有码链信息得到连通区域的轮廓码链。The first boundary point marking unit, according to the difference of the color points in the eight neighborhoods, judges the boundary points of characters and marks them; the first code chain traversal unit, according to the boundary points marked by the first boundary point marking unit, searches for the first character in the first order The boundary point that has not been traversed is used as the starting point of the contour, and the direction of entering the starting point and the code chain information are recorded, and the next boundary point is searched in the second order according to the entering direction as the contour point, and the direction of entering the contour point is recorded and The code chain information goes back to the starting point; the first code chain forming unit obtains the outline code chain of the connected area according to all the code chain information recorded by the first code chain traversal unit.

在实施中，第一顺序与第二顺序的设置目的是使得遍历时顺次得到每一个边界点的信息，从而保证得到的轮廓码链是完整的。基于这个原理，第一顺序显然也可采用为从下到上、再从右到左等等顺序；而第二顺序当然也可按“最先上、或右、或下看”的原则执行。In implementation, the purpose of setting the first order and the second order is to obtain the information of each boundary point sequentially during traversal, so as to ensure that the obtained contour code chain is complete. Based on this principle, the first order can obviously also be adopted as a sequence from bottom to top, then from right to left, etc.; and the second order can of course also be executed according to the principle of "first look up, or right, or look down".

在优选实施例中，第一个数计算模块704可以包括步长确定单元、个数计算单元。In a preferred embodiment, the first number calculation module 704 may include a step size determination unit and a number calculation unit.

具体实施中，步长确定单元可以根据步长奇偶法或奇偶嵌入法的步长确定第一步长。下面我们采用步长奇偶法来给与说明，步长奇偶法是奇偶嵌入法的扩展。用封闭连通区域内的黑色像素点个数相对于一定步长Q的倍数的奇偶性来表示0/1。这样使得算法具有一定的容错能力，经过水印攻击后像素点改变数最大不超过Q/2都可以被有效检测出来。打印扫描过程后得到的图像与原图像看似相同，实际上这个过程综合了多种图像处理过程，图像的像素值及几何位置发生了很大变换，需要水印算法具有很强的鲁棒性才能抵抗这种攻击。此时步长Q的选择变得尤为重要，如果步长过大，需要改变的像素点的个数势必增加，这样会对原文本图像的视觉效果造成很大影响；若步长过小，受打印扫描过程的影响，嵌入后的数据信息特别容易丢失。在本优选实施例中，选择步长为200，一般来说步长的选取不要超过字符图像的边界像素点的个数的两倍。在本实施例中，根据常用字体的字符边界个数的大小，选择了一个较优的数值200来实施。In a specific implementation, the step size determination unit may determine the first step size according to the step size of the step parity method or the parity embedding method. Next, we use the step parity method to give an explanation, which is an extension of the parity embedding method. 0/1 is represented by the parity of the number of black pixels in the closed connected area relative to the multiple of a certain step size Q. This makes the algorithm have a certain fault-tolerant ability, and after the watermark attack, the maximum number of pixel changes is not more than Q/2, which can be effectively detected. The image obtained after the printing and scanning process seems to be the same as the original image. In fact, this process combines a variety of image processing processes. The pixel value and geometric position of the image have undergone great changes, which requires a strong robustness of the watermarking algorithm. resist this attack. At this time, the selection of the step size Q becomes particularly important. If the step size is too large, the number of pixels to be changed will inevitably increase, which will have a great impact on the visual effect of the original text image; if the step size is too small, the affected Due to the impact of the printing and scanning process, the embedded data information is particularly easy to lose. In this preferred embodiment, the selected step size is 200. Generally speaking, the selected step size should not exceed twice the number of border pixels of the character image. In this embodiment, an optimal value of 200 is selected for implementation according to the number of character boundaries of commonly used fonts.

个数计算单元则按下述计算公式得出所述第一个数，其中：The number calculation unit obtains the first number according to the following calculation formula, wherein:

第一个数＝n+原黑点个数，m为黑点个数，n为需要增加的点个数，当n大于0时，增加黑点的个数，将白色像素翻转为黑色像素，当n小于0时，减少黑点的个数，将黑色像素翻转为白色像素，w为待嵌入的水印信息位串，Q为第一步长。The first number=n+the number of original black dots, m is the number of black dots, n is the number of dots to be added, when n is greater than 0, increase the number of black dots, turn white pixels into black pixels, when When n is less than 0, reduce the number of black dots, turn black pixels into white pixels, w is the watermark information bit string to be embedded, and Q is the length of the first step.

下面我们对优选中所选用的公式进行说明。Below we describe the formulas used in the optimization.

由于对于黑点数小于750的存在以下几种情况：Since the number of black points is less than 750, there are the following situations:

综上所述，在优选实施中，我们将第一步长Q设为200，字符像素点需要翻转个数的计算公式为：To sum up, in the preferred implementation, we set the length Q of the first step to 200, and the calculation formula for the number of character pixels that need to be flipped is:

如果我们将数字水印嵌入的方法中的过程相逆，即通过对嵌入水印的文本图像中的黑点个数等将嵌入的水印信息提取，从而就能将文本图像还原，基于上述构思，本发明还可得到在二值文本图像中提取数字水印的装置，下面我们结合附图来对提取装置的具体实施作出说明。图8是本发明提取水印信息的装置结构示意图，如图所示，在数字水印提取装置中包括图像分割处理模块801、字符黑点个数统计模块802、水印信息提取模块803、解密模块804，其中它们的结构关系是依次相连的，优选中解密模块804还与水印提取模块803相连。下面对它们所完成的功能及关系及体实施进行说明如下：If we reverse the process in the method of digital watermark embedding, that is, extract the embedded watermark information by extracting the number of black dots in the embedded watermarked text image, so that the text image can be restored. Based on the above-mentioned concept, the present invention A device for extracting a digital watermark from a binary text image is also available. Below we will describe the specific implementation of the extracting device with reference to the accompanying drawings. Fig. 8 is a schematic structural diagram of a device for extracting watermark information in the present invention. As shown in the figure, the digital watermark extraction device includes an image segmentation processing module 801, a character black dot number statistics module 802, a watermark information extraction module 803, and a decryption module 804. Wherein their structural relationship is connected sequentially, preferably, the decryption module 804 is also connected to the watermark extraction module 803 . The functions and relationships they complete and their implementation are described below:

图像分割处理模块：将文本文档扫描后进行处理得到单个字符图像区域。Image segmentation processing module: process the scanned text document to obtain a single character image area.

在扫描打印后的文本图像获取电子化后的数字图像数据后，根据由前所述的水印信息嵌入的原理，需要统计连通区域内的黑点个数，所以提取水印信息前需要进行数字图像的图像分割处理，得到单个字符图像区域。将文本文档扫描后进行处理中，具体实施时可以采用区域方法或边界方法或边缘方法来进行处理。After scanning the printed text image to obtain the electronic digital image data, according to the principle of watermark information embedding mentioned above, it is necessary to count the number of black dots in the connected area, so the digital image needs to be extracted before extracting the watermark information. Image segmentation processing to obtain a single character image area. The scanned text document is being processed, and the area method, boundary method or edge method can be used for processing in specific implementation.

由于采用步长奇偶法嵌入水印，字符图像经过打印扫描处理后黑点个数的统计值是决定水印能否正确提取的决定因素，由于打印扫描过程中的不确定因素，如果采用固定的阈值对图像字符进行分割，那么就会造成字符图像的边界像素严重失真。因此优选实施中应该采用阈值化算法动态地确定合适的阈值，尽量准确分割字符图像的边界，才能保证水印提取的正确率。Since the step size parity method is used to embed the watermark, the statistical value of the number of black dots after the character image is printed and scanned is the decisive factor to determine whether the watermark can be extracted correctly. Due to the uncertain factors in the printing and scanning process, if a fixed threshold is used for If the image characters are segmented, the border pixels of the character image will be severely distorted. Therefore, in the preferred implementation, a thresholding algorithm should be used to dynamically determine a suitable threshold, and the boundary of the character image should be segmented as accurately as possible, so as to ensure the correct rate of watermark extraction.

在本实施例中，对水印进行提取时，可以使用区域方法中有代表性的阈值化图像分割处理方法，先对扫描后的灰度图像进行大津法阈值化处理，恢复成二值图像。In this embodiment, when extracting the watermark, a representative thresholding image segmentation processing method in the area method can be used, and the scanned gray-scale image is first subjected to Otsu method thresholding processing, and restored to a binary image.

字符黑点个数统计模块用于搜索由字符组成的封闭连通区域，获取所述连通区域的轮廓码链，统计所述连通区域内的黑点个数。遍历图像时，需要先检测出字符图像区域的边缘轮廓，然后统计封闭轮廓区域内的所有黑点个数，其中边缘轮廓检测的方法跟嵌入过程相同。The module for counting the number of black dots in characters is used to search for a closed connected area composed of characters, obtain the outline code chain of the connected area, and count the number of black dots in the connected area. When traversing the image, it is necessary to detect the edge contour of the character image area first, and then count the number of all black dots in the closed contour area. The method of edge contour detection is the same as the embedding process.

优选中字符黑点个数统计模块802可以包括第二边界点标记单元8021、第二码链遍历单元8022、第二码链形成单元8023、黑点个数统计单元8024。Preferably, the counting module 802 of character black dots may include a second boundary point marking unit 8021 , a second code chain traversal unit 8022 , a second code chain forming unit 8023 , and a black dot counting unit 8024 .

第二边界点标记单元，根据八邻域内颜色点的不同判断字符的边界点并标记；The second boundary point marking unit judges the boundary points of characters according to the difference of color points in the eight neighborhoods and marks them;

第二码链遍历单元，根据第二边界点标记单元标记的边界点，按第一顺序寻找第一个未被遍历过的边界点作为轮廓起始点，并记录进入起始点的方向以及码链信息，根据进入方向按第二顺序寻找下一个边界点作为轮廓点，并记录进入轮廓点的方向以及码链信息直至回到起始点；The second code chain traversal unit, according to the boundary points marked by the second boundary point marking unit, searches for the first boundary point that has not been traversed as the starting point of the outline in the first order, and records the direction of entering the starting point and the code chain information , find the next boundary point as the contour point in the second order according to the entering direction, and record the direction of entering the contour point and the code chain information until returning to the starting point;

第二码链形成单元，根据第二码链遍历单元记录的所有码链信息得到连通区域的轮廓码链。The second code chain forming unit obtains the contour code chains of the connected regions according to all the code chain information recorded by the second code chain traversal unit.

其中第二边界点标记单元8021、第二码链遍历单元8022、第二码链形成单元8023可以根据嵌入水印时使用相同的原理来形成轮廓码链。The second boundary point marking unit 8021, the second code chain traversal unit 8022, and the second code chain forming unit 8023 can form a contour code chain according to the same principle used when embedding a watermark.

黑点个数统计单元根据轮廓码链来统计连通区域内的黑点个数。The counting unit for the number of black dots counts the number of black dots in the connected area according to the contour code chain.

水印信息提取模块是根据连通区域内的黑点个数以及第一步长提取出所述水印信息位串。The watermark information extraction module extracts the watermark information bit string according to the number of black dots in the connected area and the length of the first step.

优选中，水印提取模块可以按以下公式提取水印信息位串：Preferably, the watermark extraction module can extract the watermark information bit string according to the following formula:

优选实施中还可以包括解密模块，与水印提取模块相连，用于对提取的水印信息位串进行解密处理。解密模块是在所有的信息位提取结束后，根据嵌入时的加密信息进行解密，具体实施中可以用以下方式来实现。根据原始的水印信息位串结构，取出相应的校验数据进行信息串的正确性验证。校验通过后的数据再进行最后的解密和解压缩过程，从而还原原始的水印串信息。In a preferred implementation, a decryption module may also be included, connected to the watermark extraction module, for decrypting the extracted watermark information bit string. The decryption module performs decryption according to the encrypted information when embedding after all the information bits are extracted, which can be implemented in the following manner in specific implementation. According to the original watermark information bit string structure, the corresponding verification data is taken out to verify the correctness of the information string. The data after verification is finally decrypted and decompressed, so as to restore the original watermark string information.

如图9所示，是本发明的一个具体实施例的流程示意图，该流程图中，我们给出了水印嵌入及提取所使用的方法以及装置具体实施本发明的完整过程，用以进一步对本发明总构思的具体实施作出全面的说明，从图中可见，包括以下步骤：As shown in Figure 9, it is a schematic flow chart of a specific embodiment of the present invention. In this flow chart, we provide the method and device used for watermark embedding and extraction and the complete process of implementing the present invention in order to further understand the present invention. The specific implementation of the general idea is given a comprehensive description, as can be seen from the figure, including the following steps:

S901、在原始图像中，定位原始图像连通区域；S901. In the original image, locate the connected region of the original image;

S902、获取原始图像的轮廓编码；S902. Obtain the contour code of the original image;

S903、统计像素点个数；S903, counting the number of pixels;

S904、在水印中加入冗余信息后，应用前述公式翻转像素个数；S904. After adding redundant information to the watermark, reverse the number of pixels by applying the aforementioned formula;

S905、沿轮廓翻转像素；S905. Flip pixels along the contour;

S906、根据上述步骤获取加入水印图像；S906. Obtain the watermarked image according to the above steps;

S907、对加入水印图像进行阈值化处理；S907. Perform thresholding processing on the watermarked image;

S908、定位连通区域；S908, locating connected regions;

S909、统计像素点个数；S909, counting the number of pixels;

S910、应用公式计算水印位；S910. Applying a formula to calculate the watermark bit;

S911、提取水印。S911. Extract the watermark.

在本发明构思中，是根据连通区域进行黑点个数的变化来进行翻转从而达到数字水印嵌入的目的，而文本都具有连通区域，因此本方法及装置对一般的二值文本图像具有通用性，可以应用在包含英文、数字符号、汉字以及日文等任何字符集的文本文档。In the conception of the present invention, the number of black dots is flipped according to the connected area to achieve the purpose of digital watermark embedding, and all texts have connected areas, so the method and device are universal for general binary text images , can be applied to text documents containing any character set such as English, number symbols, Chinese characters, and Japanese.

由于本发明构思中是通过步长与增加黑点个数的关系来控制嵌入的，因此本发明所述的方法及装置对打印扫描过程具有很强的鲁棒性，能够应用于传统印刷以及打印扫描的文本文档。Since the embedding is controlled by the relationship between the step size and the number of black dots in the concept of the present invention, the method and device of the present invention have strong robustness to the printing and scanning process, and can be applied to traditional printing and printing Scanned text documents.

与现有技术相比，本发明所述方法及装置的嵌入和提取过程更加简洁，不需要事先分块和置乱处理，并且不依赖于文本文档内容的排版格式，不仅提高了运算效率，而且对于打印扫描攻击的影响，也可以方便地进行跟踪。Compared with the prior art, the embedding and extraction process of the method and device of the present invention are more concise, do not need to be divided into blocks and scrambled in advance, and do not depend on the typesetting format of the content of the text document, which not only improves the computing efficiency, but also The impact of a print scan attack can also be easily tracked.

由于本发明所述方法及装置只与文本文档中的字符所占的面积大小有关，因此使得(同时)抗打印扫描过程中平移、旋转等几何攻击的能力也得到增强。Since the method and device of the present invention are only related to the size of the area occupied by the characters in the text document, the ability to (simultaneously) resist geometric attacks such as translation and rotation during printing and scanning is also enhanced.

由于本发明中可以对嵌入的信息作各种处理，比如压缩、加密、加校验位等，所以本发明所述方法及装置不仅使得嵌入的信息灵活、而且嵌入的信息量较大、便于实现各种保密措施，同时提取的精确度较已有的方法有大幅度提高。Since the embedded information can be processed in various ways in the present invention, such as compression, encryption, check digit addition, etc., the method and device of the present invention not only make the embedded information flexible, but also have a large amount of embedded information and are easy to implement. Various security measures, and the accuracy of extraction is greatly improved compared with existing methods.

Claims

1. an embedding method of digital watermark, is characterized in that, comprises the steps:

Obtain the watermark information bit string to be embedded;

Search for a closed connected region composed of characters, and obtain the contour code chain of the connected region;

Calculate the first number of pixels that need to be flipped according to the number of black dots in the connected area, the watermark information bit string, and the length of the first step;

Flip pixels by the first number along the chain of contour codes.

2. The method according to claim 1, characterized in that, in the step of acquiring the watermark information bit string to be embedded, the watermark information bit string is encrypted.

3. The method according to claim 1, wherein the searching for a closed connected region composed of characters, and obtaining the contour code chain step of the connected region includes:

According to the difference of the color points in the eight neighborhoods, the boundary points of the characters are judged and marked;

Find the first boundary point that has not been traversed as the starting point of the contour in the first order, and record the direction of entering the starting point and the code chain information;

According to the entry direction described in the previous step, search for the next boundary point as the contour point in the second order, and record the direction of entering the contour point and the code chain information until returning to the starting point;

A contour code chain of the connected region is obtained according to all the recorded code chain information.

4. The method according to claim 1, characterized in that, the first step size is the step size of the step parity method or the parity embedding method.

5. the method for claim 1 is characterized in that, described first number draws by following formula:

The first number=n+the number of original black dots, wherein:

\{\begin{matrix} n no = = 00,, m m &Element; &Element; ((0,600 0,600)) \cup \cup ((17001700,, + + \infty \infty)) \\ n no = = m m - - 600600,, m m &Element; &Element; [[600600,750,750)) \\ n no = = Q Q - - m m % % Q Q,, m m &Element; &Element; {{m m | | m m % % Q Q > > Q Q \times \times 22 / / 33,, m m / / Q Q = = w w,, m m &GreaterEqual; &Greater Equal; 600600,, m m \leq \leq 17001700}} \\ n no = = Q Q \times \times 55 / / 33 - - m m % % Q Q,, m m &Element; &Element; {{m m | | m m % % Q Q > > Q Q \times \times 22 / / 33,, m m / / Q Q &NotEqual; &NotEqual; w w,, m m &GreaterEqual; &Greater Equal; 600600,, m m \leq \leq 17001700}} \\ n no = = m m % % Q Q,, m m &Element; &Element; {{m m | | m m % % Q Q > > Q Q \times \times 22 / / 33,, m m / / Q Q = = w w,, m m &GreaterEqual; &Greater Equal; 600600,, m m \leq \leq 17001700}} \\ n no = = Q Q \times \times 22 / / 33 - - m m % % Q Q,, m m &Element; &Element; {{m m | | m m % % Q Q > > Q Q \times \times 22 / / 33,, m m / / Q Q = = 22,, m m &GreaterEqual; &Greater Equal; 600600,, m m \leq \leq 17001700}} \end{matrix}

In the above formula, m is the number of black dots, and n is the number of dots to be added. When n is greater than 0, increase the number of black dots and turn white pixels into black pixels. When n is less than 0, reduce the number of black dots. number, flip black pixels into white pixels, w is the watermark information bit string to be embedded, and Q is the length of the first step.

6. The method according to claim 1, characterized in that, in the step of flipping pixels along the outline code chain according to the first number, the white dots on the rectangle surrounding the characters are not flipped.

7. A method for extracting a digital watermark, comprising the steps of:

Scan the text document and process it to obtain a single character image area;

Search for a closed connected region composed of characters, obtain the contour code chain of the connected region, and count the number of black dots in the connected region;

The watermark information bit string is extracted according to the number of black dots in the connected area and the first step length.

8 . The method according to claim 7 , wherein, in the step of scanning the text document and performing the processing, the text document is scanned and the image segmentation process is performed by using the region method, the boundary method or the edge method.

9. The method according to claim 7, wherein the search is performed in a closed connected region formed by characters, obtains the contour code chain of the connected region, and counts the number of black dots in the connected region, including :

Obtaining the contour code chain of the connected region according to all the recorded code chain information;

Count the number of black dots in the connected area.

10. The method according to claim 7, wherein, in the step of extracting the watermark information bit string according to the number of black dots in the connected region and the first step length, the watermark information bit string According to the following formula:

\{\begin{matrix} w w = = 00,, m m &Element; &Element; {{m m | | [[m m / / Q Q + + 0.5 0.5]] % % = = 00,, m m &GreaterEqual; &Greater Equal; 750750,, m m \leq \leq 17001700}} \\ w w = = 11,, m m &Element; &Element; {{m m | | [[m m / / Q Q + + 0.5 0.5]] % % 22 = = 11,, m m &GreaterEqual; &Greater Equal; 750750,, m m \leq \leq 17001700}} \end{matrix}

Among them, m is the number of black dots, w is the watermark information bit string, and Q is the length of the first step.

11. The method of claim 7, further comprising the steps of:

Decryption is performed on the extracted watermark information bit string.

12. A digital watermark embedding device, characterized in that it comprises:

The watermark information acquisition module is used to acquire the watermark information bit string to be embedded;

A contour code chain acquisition module, configured to search for a closed connected region composed of characters, and obtain the contour code chain of the connected region;

A counting module for the number of black dots is connected to the acquisition module of the contour code chain, and is used to count the number of black dots in the connected area;

The first number calculation module is connected with the watermark information acquisition module and the black dot number statistics module, and is used to calculate the required number according to the black dot number, the watermark information bit string, and the first step length The first number of pixels to flip;

The pixel flipping module is connected with the first number calculation module and the contour code chain acquisition module, and is used for flipping pixels according to the first number along the contour code chain.

13. The device according to claim 12, further comprising an encryption module, connected to the watermark information acquisition module, for performing encryption processing on the watermark information bit string.

14. The device according to claim 12, wherein the contour code chain obtaining module comprises:

The first boundary point marking unit is used to judge the boundary points of characters according to the difference of color points in the eight neighborhoods and mark them;

The first code chain traversal unit is used to find the first boundary point that has not been traversed as the starting point of the outline in the first order according to the boundary points marked by the first boundary point marking unit, and record the starting point direction and code chain information, according to the entry direction to find the next boundary point in the second order as the contour point, and record the direction of entering the contour point and the code chain information until returning to the starting point;

The first code chain forming unit is configured to obtain the contour code chain of the connected region according to all the code chain information recorded by the code chain traversal unit.

15. The apparatus of claim 12, wherein the first number calculation module comprises:

A step size determination unit is used to determine the first step size according to the step size of the step size parity method or the parity embedding method;

A number calculation unit, used to obtain the first number according to the following calculation formula,

\{\begin{matrix} n no = = 00,, m m &Element; &Element; ((0,600 0,600)) \cup \cup ((17001700,, + + \infty \infty)) \\ n no = = m m - - 600600,, m m &Element; &Element; [[600600,750,750)) \\ n no = = Q Q - - m m % % Q Q,, m m &Element; &Element; {{m m | | m m % % Q Q > > Q Q \times \times 22 / / 33,, m m / / Q Q = = w w,, m m &GreaterEqual; &Greater Equal; 600600,, m m \leq \leq 17001700}} \\ n no = = Q Q \times \times 55 / / 33 - - m m % % Q Q,, m m &Element; &Element; {{m m | | m m % % Q Q > > Q Q \times \times 22 / / 33,, m m / / Q Q &NotEqual; &NotEqual; w w,, m m &GreaterEqual; &Greater Equal; 600600,, m m \leq \leq 17001700}} \\ n no = = m m % % Q Q,, m m &Element; &Element; {{m m | | m m % % Q Q > > Q Q \times \times 22 / / 33,, m m / / Q Q = = w w,, m m &GreaterEqual; &Greater Equal; 600600,, m m \leq \leq 17001700}} \\ n no = = Q Q \times \times 22 / / 33 - - m m % % Q Q,, m m &Element; &Element; {{m m | | m m % % Q Q > > Q Q \times \times 22 / / 33,, m m / / Q Q = = 22,, m m &GreaterEqual; &Greater Equal; 600600,, m m \leq \leq 17001700}} \end{matrix}

Among them: the first number=n+the number of original black dots, m is the number of black dots, n is the number of dots to be added, when n is greater than 0, increase the number of black dots, and turn white pixels into black pixels , when n is less than 0, reduce the number of black dots, turn black pixels into white pixels, w is the watermark information bit string to be embedded, and Q is the length of the first step.

16. A device for extracting a digital watermark, comprising:

Image segmentation processing module: used to process the scanned text document to obtain a single character image area;

Character black dot number statistics module: connected to the image segmentation processing module, used to search for a closed connected area composed of characters, obtain the outline code chain of the connected area, and count the number of black dots in the connected area;

Watermark information extraction module: connected to the character black dot number statistics module, used to extract the watermark information bit string according to the number of black dots in the connected area and the first step length.

17. device as claimed in claim 16, is characterized in that, described character black dot number statistics module comprises:

The second boundary point marking unit is used to judge the boundary points of characters according to the difference of color points in the eight neighborhoods and mark them;

The second code chain traversal unit is used to search for the first boundary point that has not been traversed as the starting point of the outline in the first order according to the boundary points marked by the second boundary point marking unit, and record the starting point direction and code chain information, according to the entry direction to find the next boundary point in the second order as the contour point, and record the direction of entering the contour point and the code chain information until returning to the starting point;

A second code chain forming unit, configured to obtain the contour code chain of the connected region according to all the code chain information recorded by the code chain traversal unit;

The counting unit for the number of black points is used for counting the number of black points in the connected area.

18. The device according to claim 16, wherein the watermark extraction module extracts the watermark information bit string according to the following formula:

\{\begin{matrix} w w = = 00,, m m &Element; &Element; {{m m | | [[m m / / Q Q + + 0.5 0.5]] % % = = 00,, m m &GreaterEqual; &Greater Equal; 750750,, m m \leq \leq 17001700}} \\ w w = = 11,, m m &Element; &Element; {{m m | | [[m m / / Q Q + + 0.5 0.5]] % % 22 = = 11,, m m &GreaterEqual; &Greater Equal; 750750,, m m \leq \leq 17001700}} \end{matrix}

19. The device according to claim 16, further comprising a decryption module, connected to the watermark extraction module, for decrypting the extracted watermark information bit string.