CN111683098B - Anti-crawler method and device, electronic equipment and storage medium - Google Patents
Anti-crawler method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111683098B CN111683098B CN202010527694.XA CN202010527694A CN111683098B CN 111683098 B CN111683098 B CN 111683098B CN 202010527694 A CN202010527694 A CN 202010527694A CN 111683098 B CN111683098 B CN 111683098B
- Authority
- CN
- China
- Prior art keywords
- text
- content
- icon
- display
- text content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000013507 mapping Methods 0.000 claims abstract description 86
- 238000004422 calculation algorithm Methods 0.000 claims description 14
- 238000010586 diagram Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000006243 chemical reaction Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 7
- 230000002829 reductive effect Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000009193 crawling Effects 0.000 description 2
- 238000013478 data encryption standard Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
- H04L63/0435—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply symmetric encryption, i.e. same key used for encryption and decryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/20—Network architectures or network communication protocols for network security for managing network security; network security policies in general
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/565—Conversion or adaptation of application format or content
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The application provides an anti-crawler method, an anti-crawler device, electronic equipment and a storage medium, and belongs to the field of data security. The method comprises the following steps: acquiring text content from a server; converting a specified text in the text content into an icon based on the mapping relation between characters and the icon, and obtaining display content consisting of a residual text and the icon, wherein the residual text is other texts in the text content except the specified text; displaying the display content in a page. According to the method, the characters of part of the text content are replaced by the icons through the display terminal, so that the page display content is prevented from being crawled, and the anti-crawler processing efficiency is improved.
Description
Technical Field
The application relates to the technical field of data security, in particular to an anti-crawler method, an anti-crawler device, electronic equipment and a storage medium.
Background
An HTML (HyperText Markup Language) document is an important carrier of internet data, which may contain sensitive or important information, and some organizations or individuals usually crawl web pages by using a web crawler in order to quickly extract the sensitive or important information in the HTML document from the web pages. A web crawler is a program or script that automatically crawls the world Wide Web according to certain rules.
In the prior art, for anti-crawler, the front end and the back end are usually adopted to encrypt and decrypt interactive data, namely, to symmetrically encrypt and decrypt sensitive parameter information and results, but the method has the defect that plaintext information is always displayed on a page, and once the plaintext information is displayed on the page, the desired information can be crawled down in a page analysis mode. Therefore, a method of changing text attributes in an HTML document has appeared, but the existing method of converting text attributes has a problem of low processing efficiency.
Disclosure of Invention
In view of this, an object of the embodiments of the present application is to provide an anti-crawler method, an anti-crawler device, an electronic device, and a storage medium, so as to solve the problem in the prior art that the crawler processing efficiency is poor.
The embodiment of the application provides an anti-crawler method, which is applied to a display end, and comprises the following steps: acquiring text content from a server; converting a specified text in the text content into an icon based on the mapping relation between characters and the icon, and obtaining display content consisting of a residual text and the icon, wherein the residual text is other texts in the text content except the specified text; displaying the display content in a page.
In the implementation mode, the specified text in the text content to be displayed is converted into the icon, so that the web crawler cannot accurately crawl the text content, the page anti-crawler function is ensured, meanwhile, the display end executes the conversion operation of the text and the icon, the server is prevented from performing large-batch conversion operation, and the crawler processing efficiency is improved.
Optionally, the obtaining the text content includes: acquiring encryption information containing the text content from the server; and decrypting the encrypted information based on a symmetric encryption algorithm matched with the server side to obtain the text content.
In the implementation mode, the display terminal and the server encrypt data such as text content when transmitting the data, so that the data security is improved.
Optionally, before the converting the specified text in the text content into the icon based on the mapping relationship between the characters and the icon, the method further includes: acquiring the mapping relation from the server; and storing the mapping relation based on a json format or a list format.
In the implementation mode, the display end stores the mapping relation as a file in a json format or a list format, so that the text and the icon are replaced based on local data, and the crawler sending processing efficiency is improved.
Optionally, the converting the specified text in the text content into an icon based on the mapping relationship between the characters and the icon includes: determining the attribute name of the cascading style sheet corresponding to the specified text based on the mapping relation; and replacing the designated text with an icon corresponding to the attribute name of the cascading style sheet.
In the implementation mode, the text and the icon are replaced based on the attribute name of the cascading style sheet, text characters are prevented from being used as the icon name, and the efficiency and the safety of the text and icon replacement operation are guaranteed.
The embodiment of the application also provides an anti-crawler method, which is applied to a server side and comprises the following steps: determining text content to be displayed by a display end and a mapping relation between characters corresponding to the text content and icons; and sending the text content and the mapping relation to the display end, wherein the display end is used for converting the specified text in the text content into the icon based on the mapping relation between the characters and the icon, obtaining the display content consisting of the residual text and the icon, and displaying the display content in the page, wherein the residual text is other texts in the text content except the specified text.
In the implementation mode, the server side sends the text content to be displayed by the display side to the display side, the display side carries out replacement operation of the appointed text and the appointed icon, the mapping relation between the characters and the icon is only needed to be sent once corresponding to one display side, and the server side is not needed to carry out large-batch replacement operation, so that the load of the server side is reduced, and the anti-crawler processing efficiency is improved.
Optionally, before the determining the text content that needs to be displayed on the display end and the mapping relationship between the characters corresponding to the text content and the icons, the method further includes: generating an icon corresponding to the designated character by using a vector icon library, wherein the display style of the icon is the same as that of the corresponding character; and adopting the ASCII code of the character corresponding to the icon as the attribute name of the cascading style sheet of the icon.
In the implementation mode, the icon corresponding to the designated character is generated through the vector icon library, the efficiency of icon generation operation is improved, and meanwhile, the security of anti-crawling is improved by adopting the ASCII code of the character corresponding to the icon as the attribute name of the cascading style sheet of the icon.
Optionally, the sending the text content and the mapping relationship to the display end includes: encrypting the text content and the mapping relation based on a symmetric encryption algorithm matched with the display end to obtain encrypted information; and sending the encrypted information to the display terminal.
In the implementation mode, the display end and the server end encrypt data such as text content when transmitting the data, and the data security is improved.
The embodiment of the application further provides an anti-reptile device, is applied to the display end, the device includes: the text acquisition module is used for acquiring text contents from the server; the text replacement module is used for converting a specified text in the text content into an icon based on the mapping relation between the characters and the icon to obtain display content consisting of a residual text and the icon, wherein the residual text is other texts except the specified text in the text content; and the display module is used for displaying the display content in a page.
In the implementation mode, the specified text in the text content to be displayed is converted into the icon, so that the web crawler cannot accurately crawl the text content, the page anti-crawler function is ensured, meanwhile, the display end executes the conversion operation of the text and the icon, the server is prevented from performing large-batch conversion operation, and the crawler processing efficiency is improved.
Optionally, the text obtaining module is specifically configured to: acquiring encryption information containing the text content from the server; and decrypting the encrypted information based on a symmetric encryption algorithm matched with the server side to obtain the text content.
In the implementation mode, the display end and the server end encrypt data such as text content when transmitting the data, and the data security is improved.
Optionally, the anti-crawler further comprises: the mapping relation obtaining module is used for obtaining the mapping relation from the server; and storing the mapping relation based on the json format or the list format.
In the implementation mode, the display end stores the mapping relation as a file in a json format or a list format, so that the text and the icon are replaced based on local data, and the crawler sending processing efficiency is improved.
Optionally, the text replacement module is specifically configured to: determining the attribute name of the cascading style sheet corresponding to the specified text based on the mapping relation; and replacing the designated text with an icon corresponding to the attribute name of the cascading style sheet.
In the implementation mode, the text and the icon are replaced based on the attribute name of the cascading style sheet, text characters are prevented from being used as the icon name, and the efficiency and the safety of the text and icon replacement operation are guaranteed.
The embodiment of the application further provides an anti-crawler device, which is applied to a server, the device comprises: the content determining module is used for determining the text content to be displayed by the display end and the mapping relation between the characters corresponding to the text content and the icons; and the sending module is used for sending the text content and the mapping relation to the display end, converting the specified text in the text content into an icon by the display end based on the mapping relation between the characters and the icon, obtaining display content consisting of residual text and the icon, and displaying the display content in a page, wherein the residual text is other texts in the text content except the specified text.
In the implementation mode, the server side sends the text content to be displayed by the display side to the display side, the display side carries out replacement operation of the appointed text and the appointed icon, the mapping relation between the characters and the icon is only needed to be sent once corresponding to one display side, and the server side is not needed to carry out large-batch replacement operation, so that the load of the server side is reduced, and the anti-crawler processing efficiency is improved.
Optionally, the anti-crawler apparatus further comprises: the icon generating module is used for generating an icon corresponding to the designated character by utilizing a vector icon library, and the display style of the icon is the same as that of the corresponding character; and adopting the ASCII code of the character corresponding to the icon as the attribute name of the cascading style sheet of the icon.
In the implementation mode, the icon corresponding to the designated character is generated through the vector icon library, the icon generation operation efficiency is improved, and meanwhile, the security of anti-crawling is improved by adopting the ASCII code of the character corresponding to the icon as the attribute name of the cascading style sheet of the icon.
Optionally, the sending module is specifically configured to: encrypting the text content and the mapping relation based on a symmetric encryption algorithm matched with the display end to obtain encrypted information; and sending the encrypted information to the display terminal.
In the implementation mode, the display terminal and the server encrypt data such as text content when transmitting the data, so that the data security is improved.
An embodiment of the present application further provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores program instructions, and the processor executes the steps in any one of the foregoing implementation manners when reading and executing the program instructions.
An embodiment of the present application further provides a storage medium, where computer program instructions are stored in the storage medium, and when the computer program instructions are read and executed by a processor, the steps in any one of the above implementation manners are performed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of an anti-crawler method applied to a display device according to an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating a conversion procedure for specifying a text and an icon according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of an anti-crawler method applied to a client according to an embodiment of the present disclosure;
fig. 4 is a schematic block diagram of an anti-crawler apparatus applied to a display end according to an embodiment of the present application;
fig. 5 is a schematic block diagram of an anti-crawler apparatus applied to a server according to an embodiment of the present disclosure.
Icon: 30-an anti-crawler device; 31-a text acquisition module; 32-text replacement module; 33-a display module; 40-an anti-crawler device; 41-a content determination module; 42-a sending module.
Detailed Description
The technical solution in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Referring to fig. 1, fig. 1 is a schematic flow chart diagram of an anti-crawler method applied to a display end according to an embodiment of the present application, where the specific steps of the anti-crawler method may be as follows:
step S12: and acquiring the text content from the server.
Alternatively, the text content is typically an HTML document, but may be a document in other formats. For example, "zhang san of first company publishes a table document in second era".
Further, the text content sent by the server can be encrypted information, so that an attacker is prevented from illegally acquiring the text content in the data transmission process, and the data transmission safety in the crawler sending processing flow is improved. The specific steps of the decryption may include: acquiring encryption information containing text content from a server; and decrypting the encrypted information based on a symmetric encryption algorithm matched with the server side to obtain the text content.
It should be understood that other Encryption algorithms such as asymmetric Encryption Algorithm may be used for Encryption and decryption operations in addition to the symmetric Encryption Algorithm such as AES (Advanced Encryption Standard), DES (Data Encryption Standard), IDEA (International Data Encryption Algorithm).
Step S14: and converting the specified text in the text content into the icon based on the mapping relation between the characters and the icon to obtain display content consisting of the residual text and the icon, wherein the residual text is other text except the specified text in the text content.
Before the replacement of the designated text and the icon is performed based on the mapping relationship, the mapping relationship indicating the replacement operation of the designated text and the icon needs to be received from the server. Specifically, the receiving of the mapping relationship may include: acquiring a mapping relation from a server; the mapping relationship is saved based on the json format or the list format. After the mapping relation is stored in the local display terminal based on the json format or the list format, the designated text and the designated icon can be replaced directly based on the local mapping relation without performing large-batch replacement work by the server, so that the load of the server is reduced, and the overall efficiency of crawler sending processing is improved.
It should be understood that after receiving the text content of the server, icon replacement needs to be performed on the specified text based on the mapping relationship between the characters and the icons, the specified text and the icons that need to be replaced in the text content transmitted by different servers may be different, and the mapping relationship stored by the display terminal may also be multiple. Therefore, the embodiment may also select the mapping relationship corresponding to the server first, and specifically includes the following steps: and determining a specified mapping relation corresponding to the server side from the locally stored mapping relations based on the server side, so as to convert the specified text in the text content into the icon by adopting the specified mapping relation.
In other embodiments, in addition to determining the designated mapping relationship corresponding to the server based on the server identity identifier such as the serial number of the server, different text contents transmitted by the same server may also need to adopt different mapping relationships, so that the designated mapping relationship may be determined based on the encryption mode of the encryption information, the text content identifier, and the like.
Optionally, referring to fig. 2, the step S14 of "converting the specified text in the text content into the icon based on the mapping relationship between the character and the icon" may specifically include the following steps:
step S142: and determining the attribute name of the cascading style sheet corresponding to the specified text based on the mapping relation.
Cascading Style Sheets (CSSs) is a computer Language used to represent file styles such as HTML or XML (Extensible Markup Language). The CSS can not only statically modify the web page, but also dynamically format elements of the web page in coordination with various scripting languages. In particular, CSS enables pixel-level precise control over layout of element positions in a web page, supports almost all font size styles, and has the ability to edit web page objects and model styles. Therefore, in the present embodiment, CSS virtualization processing is adopted to replace the designated text and icon, and the attribute name of the cascading style sheet can be regarded as the name corresponding to the icon.
Further, when the text and the icon are replaced, the specified text which needs to be replaced by the corresponding icon is determined based on the mapping relation, and the specified text can be preset specified text based on sensitive words, privacy attributes and the like, such as a person name, a place name, a telephone number and the like.
Taking "zhangsan of the first company publishes a table document in the second issue" as an example, the display determines a corresponding mapping relationship based on the identity of the server that sends the text content, and then the display reads the local mapping relationship, wherein the specified text that needs to be replaced by the icon includes "first company", "zhangsan" and "second issue", the attribute name of the cascading style sheet corresponding to "first company" is "diyignong si", "zhangsan" is "and the attribute name of the cascading style sheet corresponding to" zhangsan "is" and "dierqikan". The attribute names of the cascading style sheet are in one-to-one correspondence with the icons, the icons corresponding to each section of the appointed texts are respectively determined based on the attribute names of the cascading style sheet, the display effect of the icons corresponding to each section of the appointed texts is the same as that of the appointed texts, for example, the icon corresponding to the first company is a picture with the display content of the first company.
Step S144: and replacing the designated text with an icon corresponding to the attribute name of the cascading style sheet.
It should be understood that the icons may be in any image file format such as svg, png, etc.
Step S16: and displaying the display content in a page.
The CSS pseudo-processing technology is characterized in that the output response HTML text has partial text missing and is hidden in a CSS pseudo-class, a user can see normal display presentation only when the CSS is rendered, and for a crawler program, the output text is incomplete and cannot be crawled. For the crawler technology which sends a request to acquire HTML text content of a resource through a program so as to extract important text data, a document is loaded in a memory in a browser mode, and after the document is loaded and rendered, the crawler technology which reads the rendered text has a good precaution effect by injecting a json program for crawling.
In order to cooperate with the above anti-crawler method applied to the display end, the embodiment further provides an anti-crawler method applied to the client. Referring to fig. 3, fig. 3 is a schematic flowchart illustrating an anti-crawler method applied to a client according to an embodiment of the present disclosure. The anti-crawler method applied to the client side comprises the following specific steps:
step S22: and determining the text content required to be displayed by the display terminal and the mapping relation between the characters corresponding to the text content and the icons.
The designated texts to be replaced by the icons in the text content may not be the same, so that a local uniform mapping relation of the client or a mapping relation corresponding to the text content can be obtained, and the mapping relation is sent to the display terminal.
It should be understood that the mapping relationship includes the specified text and the icon, and therefore the icon corresponding to the specified text needs to be determined first, and then the following steps may be further included in this embodiment before step S22:
step S211: and generating an icon corresponding to the designated character by using a vector icon library, wherein the display style of the icon is the same as that of the corresponding character.
The designated characters can be characters corresponding to designated texts preset based on sensitive words, privacy attributes and the like, such as a person name, a place name, a telephone number and the like.
Step S212: and adopting the ASCII code of the character corresponding to the icon as the attribute name of the cascading style sheet of the icon.
Specifically, the corresponding relation between the cascading style sheet attribute names of the characters and the icons is established, the ASCII codes of the characters such as English letters and numbers can be used as the cascading style sheet attribute names of the corresponding icons, the mapping modes such as the ASCII codes and the pinyin of the Chinese characters can be used as the cascading style sheet attribute names of the Chinese character icons for the Chinese characters in the characters, each character is mapped to the corresponding icon according to the mapping relation, and from the visual experience, the page is displayed as the character, but the actual content displayed on the page is the icon.
Step S24: and sending the text content and the mapping relation to a display end for converting the specified text in the text content into the icon based on the mapping relation between the characters and the icon at the display end, obtaining the display content consisting of the residual text and the icon, and displaying the display content in the page, wherein the residual text is other than the specified text in the text content.
Optionally, when the mapping relationship and the text content are sent, the mapping relationship and the text content may be encrypted, and the specific steps may be as follows: encrypting the text content and the mapping relation based on a symmetric encryption algorithm matched with the display end to obtain encryption information; and sending the encrypted information to a display end.
In the embodiment, the service end only needs to determine the text content and send the text content and the mapping relation, so that the situation that when one service end corresponds to a large number of display ends, a large amount of text contents need to be replaced by appointed texts and icons is avoided, the load of the service end is greatly reduced, and the efficiency of the reverse crawling processing is improved.
In order to cooperate with the above anti-crawler method applied to the display end, the embodiment of the application further provides an anti-crawler device applied to the display end. Referring to fig. 4, fig. 4 is a schematic block diagram of an anti-crawler apparatus applied to a display end according to an embodiment of the present disclosure.
The anti-crawler apparatus 30 includes:
a text obtaining module 31, configured to obtain text content from a server;
the text replacement module 32 is configured to convert a specified text in the text content into an icon based on a mapping relationship between the characters and the icon, and obtain display content composed of a remaining text and the icon, where the remaining text is another text except the specified text in the text content;
and a display module 33, configured to display the display content in the page.
Optionally, the text obtaining module 31 is specifically configured to: acquiring encryption information containing text content from a server; and decrypting the encrypted information based on a symmetric encryption algorithm matched with the server side to obtain the text content.
Optionally, the anti-crawler device 30 further comprises: the mapping relation obtaining module is used for obtaining the mapping relation from the server; and storing the mapping relation based on the json format or the list format.
Optionally, the text replacement module 32 is specifically configured to: determining the attribute name of the cascading style sheet corresponding to the specified text based on the mapping relation; and replacing the designated text with an icon corresponding to the attribute name of the cascading style sheet.
In order to cooperate with the anti-crawler method applied to the server, the embodiment of the application further provides an anti-crawler device applied to the server. Referring to fig. 5, fig. 5 is a schematic block diagram of an anti-crawler apparatus applied to a server according to an embodiment of the present disclosure.
The anti-crawler apparatus 40 includes:
a content determining module 41, configured to determine text content to be displayed on the display end, and a mapping relationship between characters and icons corresponding to the text content;
and the sending module 42 is configured to send the text content and the mapping relationship to the display end, and is configured to convert the specified text in the text content into the icon based on the mapping relationship between the characters and the icon by the display end, obtain display content composed of the remaining text and the icon, and display the display content in the page, where the remaining text is another text except the specified text in the text content.
Optionally, the anti-crawler apparatus 40 further comprises: the icon generating module is used for generating an icon corresponding to the specified character by utilizing a vector icon library, and the display style of the icon is the same as that of the corresponding character; and adopting ASCII (American standard code for information interchange) codes of characters corresponding to the icons as attribute names of the cascading style sheets of the icons.
Optionally, the sending module 42 is specifically configured to: encrypting the text content and the mapping relation based on a symmetric encryption algorithm matched with the display end to obtain encryption information; and sending the encrypted information to a display end.
The embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores program instructions, and when the processor reads and runs the program instructions, the processor executes the steps in any one of the methods of the anti-crawler method provided in this embodiment.
It should be understood that the electronic device may be a Personal Computer (PC), a tablet PC, a smart phone, a Personal Digital Assistant (PDA), or other electronic devices having a logic calculation function.
The embodiment of the application also provides a readable storage medium, wherein computer program instructions are stored in the readable storage medium, and the computer program instructions are read by a processor and executed to execute the steps in the hyper-crawler method.
To sum up, the embodiment of the present application provides an anti-crawler method, an anti-crawler device, an electronic device, and a storage medium, where the anti-crawler method applied to a display end includes: acquiring text content from a server; converting a specified text in the text content into an icon based on the mapping relation between characters and the icon, and obtaining display content consisting of a residual text and the icon, wherein the residual text is other texts in the text content except the specified text; displaying the display content in a page.
In the implementation mode, the specified text in the text content to be displayed is converted into the icon, so that the web crawler cannot accurately crawl the text content, the page anti-crawler function is ensured, meanwhile, the display end executes the conversion operation of the text and the icon, the server is prevented from performing large-batch conversion operation, and the crawler processing efficiency is improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. The apparatus embodiments described above are merely illustrative, and for example, the block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices according to various embodiments of the present application. In this regard, each block in the block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams, and combinations of blocks in the block diagrams, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. The present embodiment further provides a readable storage medium, which stores computer program instructions, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the steps of any one of the block data storage methods. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a RanDom Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" comprises 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
Claims (9)
1. An anti-crawler method is applied to a display end, and the method comprises the following steps:
acquiring text content from a server;
determining the attribute name of a cascading style sheet corresponding to a specified text based on the mapping relation between characters and icons stored locally, converting the specified text in the text content into the icons corresponding to the attribute names of the cascading style sheet one to one, and obtaining display content consisting of the residual text and the icons, wherein the residual text is other texts except the specified text in the text content; the specified text comprises sensitive information of a person name, a place name and a telephone number;
displaying the display content in a page.
2. The method of claim 1, wherein the obtaining text content comprises:
acquiring encryption information containing the text content from the server;
and decrypting the encrypted information based on a symmetric encryption algorithm matched with the server side to obtain the text content.
3. The method according to claim 1, wherein before the converting the designated text in the text content into the icon based on the mapping relationship between the characters and the icon, the method further comprises:
acquiring the mapping relation from the server;
and storing the mapping relation based on the json format or the list format.
4. An anti-crawler method is applied to a server side, and comprises the following steps:
determining text content to be displayed by a display end and a mapping relation between characters and icons corresponding to the text content;
and sending the text content and the mapping relation stored locally to the display end, wherein the mapping relation is used for determining the attribute name of the cascading style sheet corresponding to the appointed text by the display end based on the mapping relation between the characters and the icons, converting the appointed text in the text content into the icons corresponding to the attribute names of the cascading style sheet one by one, obtaining the display content consisting of the residual text and the icons, and displaying the display content in a page, wherein the residual text is other texts in the text content except the appointed text, and the appointed text comprises sensitive information of a person name, a place name and a telephone number.
5. The method according to claim 4, wherein before the determining the text content required to be displayed on the display end and the mapping relationship between the characters corresponding to the text content and the icons, the method further comprises:
generating an icon corresponding to the designated character by using a vector icon library, wherein the display style of the icon is the same as that of the corresponding character;
and adopting the ASCII code of the character corresponding to the icon as the attribute name of the cascading style sheet of the icon.
6. The method of claim 4, wherein sending the textual content and the mapping relationship from the local storage to the display comprises:
encrypting the text content and the mapping relation based on a symmetric encryption algorithm matched with the display end to obtain encryption information;
and sending the encrypted information to the display terminal.
7. An anti-crawler device, applied to a display side, the device comprising:
the text acquisition module is used for acquiring text contents from the server;
the text replacement module is used for determining the attribute names of the cascading style sheet corresponding to the designated texts based on the mapping relation between the characters and the icons stored locally, converting the designated texts in the text contents into the icons in one-to-one correspondence with the attribute names of the cascading style sheet, and obtaining display contents consisting of the residual texts and the icons, wherein the residual texts are other texts except the designated texts in the text contents; the specified text comprises sensitive information of a person name, a place name and a telephone number; and the display module is used for displaying the display content in a page.
8. An anti-reptile device, characterized in that, be applied to the server side, the device includes:
the content determining module is used for determining the text content to be displayed by the display end and the mapping relation between the characters corresponding to the text content and the icons;
and the sending module is used for sending the text content and the mapping relation stored locally to the display end, converting the specified text in the text content into an icon by the display end based on the mapping relation between the characters and the icon, obtaining display content consisting of residual text and the icon, and displaying the display content in a page, wherein the residual text is other texts in the text content except the specified text.
9. An electronic device, comprising a memory having stored therein program instructions and a processor that, when executed, performs the steps of the method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010527694.XA CN111683098B (en) | 2020-06-10 | 2020-06-10 | Anti-crawler method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010527694.XA CN111683098B (en) | 2020-06-10 | 2020-06-10 | Anti-crawler method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111683098A CN111683098A (en) | 2020-09-18 |
CN111683098B true CN111683098B (en) | 2022-12-23 |
Family
ID=72435350
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010527694.XA Active CN111683098B (en) | 2020-06-10 | 2020-06-10 | Anti-crawler method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111683098B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112422543A (en) * | 2020-11-09 | 2021-02-26 | 建信金融科技有限责任公司 | Anti-crawler method and device |
CN112650905A (en) * | 2020-12-22 | 2021-04-13 | 深圳壹账通智能科技有限公司 | Anti-crawler method and device based on label, computer equipment and storage medium |
CN112769787A (en) * | 2020-12-29 | 2021-05-07 | 深圳一科互联有限公司 | Website system network security anti-crawler calculation method and device |
CN113987569A (en) * | 2021-10-14 | 2022-01-28 | 武汉联影医疗科技有限公司 | Anti-crawler method and device, computer equipment and storage medium |
CN116976280B (en) * | 2023-09-22 | 2023-12-01 | 北京国科恒通科技股份有限公司 | Vector icon-based power grid GIS graphic primitive rendering method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766237A (en) * | 2017-09-22 | 2018-03-06 | 北京锐安科技有限公司 | Method of testing, device, server and the storage medium of web crawlers |
CN110069688A (en) * | 2019-03-16 | 2019-07-30 | 平安城市建设科技(深圳)有限公司 | Page display method, server, storage medium and the device of anti-crawler |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341160B (en) * | 2016-05-03 | 2020-09-01 | 北京京东尚科信息技术有限公司 | Crawler intercepting method and device |
CN110851682A (en) * | 2019-10-17 | 2020-02-28 | 上海易点时空网络有限公司 | Text anti-crawler method, server and display terminal |
CN111008348A (en) * | 2019-11-28 | 2020-04-14 | 盛业信息科技服务(深圳)有限公司 | Anti-crawler method, terminal, server and computer readable storage medium |
-
2020
- 2020-06-10 CN CN202010527694.XA patent/CN111683098B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766237A (en) * | 2017-09-22 | 2018-03-06 | 北京锐安科技有限公司 | Method of testing, device, server and the storage medium of web crawlers |
CN110069688A (en) * | 2019-03-16 | 2019-07-30 | 平安城市建设科技(深圳)有限公司 | Page display method, server, storage medium and the device of anti-crawler |
Also Published As
Publication number | Publication date |
---|---|
CN111683098A (en) | 2020-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111683098B (en) | Anti-crawler method and device, electronic equipment and storage medium | |
JP6206866B2 (en) | Apparatus and method for holding obfuscated data in server | |
US6601108B1 (en) | Automatic conversion system | |
Renaud et al. | How to make privacy policies both GDPR-compliant and usable | |
US9754120B2 (en) | Document redaction with data retention | |
US8042036B1 (en) | Generation of a URL containing a beginning and an ending point of a selected mark-up language document portion | |
US8910036B1 (en) | Web based copy protection | |
US20110258535A1 (en) | Integrated document viewer with automatic sharing of reading-related activities across external social networks | |
US8887290B1 (en) | Method and system for content protection for a browser based content viewer | |
US20110179352A1 (en) | Systems and methods for providing content aware document analysis and modification | |
US20210149842A1 (en) | System and method for display of document comparisons on a remote device | |
IL129633A (en) | Automatic conversion system | |
Mir | Copyright for web content using invisible text watermarking | |
Taleby Ahvanooey et al. | An innovative technique for web text watermarking (AITW) | |
US9665543B2 (en) | System and method for reference validation in word processor documents | |
CN111859853A (en) | Webpage text encryption and decryption method based on random font | |
CN111666466A (en) | Method, system, apparatus and computer-readable storage medium for preventing crawler | |
CN117957561A (en) | Network font service method of font service system | |
CN112068826A (en) | Control method, system, electronic device and storage medium for text input | |
JP3941253B2 (en) | Hypertext system and method for handling hypertext | |
JP5907130B2 (en) | Information processing device | |
US20100017708A1 (en) | Information output apparatus, information output method, and recording medium | |
JP3676120B2 (en) | Text electronic authentication apparatus, method, and recording medium recording text electronic authentication program | |
US20230409833A1 (en) | Information processing system and information processing method | |
CN115099200B (en) | Tamper-proof text processing method and device and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |