Summary of the invention
The object of the invention is to overcome existing big to the transmission quantity of method for protecting web texts; the deficiency of poor stability; a kind of method for protecting web texts based on picture has been proposed; to prevent that the user from obtaining and propagating the content of page text; protecting network literary works author's copyright and interests reduce the loss of website simultaneously effectively.
To achieve these goals, text guard method provided by the invention comprises:
(1) obtains content of text at the Web server end, upset the text order;
(2) the fraction content in the picked at random text generates picture;
(3) with not generating the residue content of picture in the text, encrypt with the corresponding coordinate information of each literal, and compression;
(4) information after will compressing, and the picture that is generated is saved in the html page, is transferred to client;
(5) after client receives the html page, obtain picture wherein, picture as a setting, and the compressed information that receives carried out decompress(ion) and deciphering, restore raw information;
(6) client becomes pixel with each literal interpretation in the raw information of reduction, generate a little picture that only comprises this literal, and according to the coordinate information of this literal, picture is added on the Background, the full page that comprises complete Word message the most at last is shown to the user.
The present invention has following advantage:
1) the present invention has reduced the quantity of information when transmitting because the medium and small partial content of picked at random text generates picture;
2) the present invention is because out of order with text, and text is remained content and coordinate ciphered compressed, makes in the transmission course difficulty that becomes of the recovery after the information acquisition, simultaneously, effectively prevented by checking that the page source code obtains text;
3) the present invention effectively prevents OCR identification owing to the contents such as font that add in interfere information and the standard character library in Background, further protects webpage text content;
4) the present invention effectively prevents to obtain the content of propagating page text by " copy-paste " because remaining each literal interpretation of text is become pixel.
Embodiment
Further specify technical scheme of the invention process below in conjunction with accompanying drawing and concrete enforcement.
Fig. 1 is web browser its working principles figure.The first half is represented the relation of client computer and server interaction; The latter half diagram is then represented the relation of browser and web server interaction.Wherein, browser at first sends request to the web server, and the web server is made response to request, and response data is sent to client browser, and normally the html file is shown to the html fileinfo on the user side screen by browser then.This is the most basic network application principle.Web server process module wherein will be handled text, generate the required Pixel Information of client synthesising picture.
With reference to Fig. 2, the present invention comprises the steps: in the workflow of server end
Step 1 is obtained content of text at the Web server end, upsets the text order.
Ask when the user sends URL, in the time of browsing the literary works content of certain web page, browser sends to server end with this request.After server end receives this request, search the literary works content that comprises in webpage that the user will browse and this page.According to the information that obtains, extract the literary works content, in order to increase the level of security of text protection, text is carried out out of order operation.This out of order operation be according to user account, login time and IP address as random seed, upset the original order of text.
Step 2, the fraction content in the picked at random text generates picture.
The text after out of order, picked at random fraction content, and from browser server end character library, select certain font at random, generate the Pixel Information of selected content, synthesising picture.
Wherein, in order to hinder OCR identification, in the process that generates pixel, disturbing factors such as some image plus noises have been added.Specifically, in the process that generates Pixel Information, suitably add the background pixel point, disturb pixel, noise line, and literal is added shade, mutilation body, adds the word in the standard character library, and suitably add the font in the self-made characters body storehouse.
Step 3 with not generating the residue content of picture in the text, is encrypted with the corresponding coordinate information of each literal, and compression.
In order to guarantee when client is reduced urtext, to know the position of each literal exactly, before out of order, need to note the coordinate of each literal in urtext.
At first, obtain the residue text and reach the wherein pixel coordinate information of each literal in the generation picture,, need encrypt this information in order to guarantee the transmission safety row; Then,, further compress again for the information after encrypting, transmission again, thus improved transfer efficiency, help reader to read.
According to the operating characteristic of browser, guaranteed efficiency, will make major part be operated in browser client and finish, and on the other hand,, require server end to take certain measure again in order to improve the security of Network Transmission.Thereby, in order to take into account the safety of network transmission efficiency and Network Transmission, take the fraction text to transmit as picture, most of measure with the text transmission makes the two can obtain better combination.
Step 4, with the information after the compression, and the picture that is generated is saved in the html page, is transferred to client.
Picture with the needs generation; and the sensitive information after the ciphered compressed is written to the html page; and further be transferred to client browser; by this processing; in the process of Network Transmission; even the html page that is transmitted is intercepted and captured; resulting also only one comprise the picture of the medium and small segment word of text and Word message and the coordinate information that the quilt after the ciphered compressed is upset order; so information after above-mentioned will the compression; and the picture that is generated is saved in earlier in the html page, and this disposal route that is transferred to client again can effectively be protected page text.
With reference to figure 3, the present invention comprises the steps: in the treatment scheme of browser
Steps A receives the html page that server end sends, and parses key message wherein.
In a single day client browser receives the html information that the web server sends over, will analyze content of pages, parse the crucial sensitive information in the webpage, these information comprise the pictorial information of generation, the residue text message and the coordinate information of ciphered compressed.
Step B obtains picture picture as a setting.
According to the information that obtains in the steps A, extract picture wherein, this picture is that server end step 2 generated has the fraction literal and prevents the picture of the interfere information of OCR identification, with picture as the background picture in the customer terminal webpage.
Step C obtains residue Word message wherein, and it is carried out decompress(ion) and deciphering.
According to the Word message that obtains in the steps A, then at first be responsible for decompress(ion) by browser client, then according to encryption key, decrypt out of order text message.
Step D generates the picture of each literal.
Browser client reads each literal in the information successively, and it is construed to Pixel Information, generates the picture that comprises this literal.In the process that generates picture, the same technology that hinders OCR identification of utilizing, promptly in the process that generates the text Pixel Information, text is added the image plus noises of various obstruction OCR identification, comprise the background pixel point, disturb pixel, noise line, literal add shade, mutilation body, add the word in the standard character library and suitably add font in the self-made characters body storehouse.
Step e according to the position of each literal in picture, reads the coordinate information of the little picture of each literal in background picture, according to this coordinate, every little picture is added on the relevant position of background picture, obtains comprising the picture webpage of full copy information, and give the user this web displaying.
Effect of the present invention can further specify by following experiment embodiment:
1. the urtext of this experimental selection is " in the little a compound occupied by many households of interior emperor's diet room; because some sees that the people of dinner party does not also return; some is but slept down early; thereby whole courtyard is all dark; quiet to the utmost point quiet; as to touch the black room of wanting to give for change oneself, but again and again attacked at heart by a kind of fear, the broken snow that falls once in a while also can be shied me, two bunches of glittering lights sparkle, be unique interdependent seemingly in such cold night, for each other radiance exist, if there is one to go out, in another small cup ... the capital is permanent lonely in the dark ... transmit deserted and lonely sounding the night watches far away, a unexpected sound, at heart also as being frightened, indistinct to trembling ... I do not know what oneself is fearing, but have a hunch a kind of atmosphere of constraining gradually to the limit, begin to split and be dispersed in this Forbidden City, some thing really is premonitory ... be that I have thought suddenly like this after how long having crossed? " extract the coordinate of literal then, at last with user account, the original order of text is upset as random seed in login time and IP address.
2. at server end, the fraction content of this experiment picked at random text for " the interior food for the emperor is little, and put in order the courtyard black because of dinner be that only this existence is gone to planting the also bright sample of idol snow; the distant sound prominent acoustic shock of sounding the night watches in being; war gradually begin very pre-in the city ... " totally 46 words and punctuation mark, generate picture, in the picture that generates, add the interfere information that hinders OCR identification, used the font in java standard library and the self-made characters body storehouse, as shown in Figure 4.As seen from Figure 4, this experiment picked at random 46 words and punctuation mark, and added the interfere information that hinders OCR identification and used font in java standard library and the self-made characters body storehouse, finished fraction text generation picture.
3. compression and encrypt out of order literal and coordinate information then, is kept at pictorial information and compressed information in the html page and returns to client.
4. client receives the information that server sends, and obtains pictorial information picture as a setting wherein, and decompress(ion) and decryption information, obtains out of order literal and coordinate information.
5. read each literal in the information successively, generate the little picture of literal that has interfere information one by one, as " driving " word, its picture as shown in Figure 5.As seen from Figure 5, this experiment is generated as the corresponding character picture with " driving " word, and has added interfere information.
6 read the respective coordinates of literal in background picture, are superimposed upon on the Background according to the picture of coordinate information with each literal correspondence.When realizing, we the upper left corner coordinate of each literal place background picture region as the position of this literal in picture.After being added to the picture of each literal correspondence on the Background, can see the picture that comprises whole text message, as shown in Figure 6.As seen from Figure 6, this experiment is finished literal is generated picture and adds the interference pixel, has used font in standard character library and the self-made characters storehouse, effectively hinders OCR identification, and the information of preventing is usurped, and has protected web page text.
Can draw by this experiment; the method for protecting web texts that the present invention proposes based on picture; not only can improve the efficient of transmission, and also guarantee safety of transmission to a certain extent, thereby ensure the network author effectively and the rights and interests of the website of charging.