CN112417305A - Website sensitive word detection system and method - Google Patents
Website sensitive word detection system and method Download PDFInfo
- Publication number
- CN112417305A CN112417305A CN202011454305.1A CN202011454305A CN112417305A CN 112417305 A CN112417305 A CN 112417305A CN 202011454305 A CN202011454305 A CN 202011454305A CN 112417305 A CN112417305 A CN 112417305A
- Authority
- CN
- China
- Prior art keywords
- information
- website
- module
- sensitive
- title
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a system and a method for detecting website sensitive words, which belong to the technical field of website information maintenance and comprise an input module, an image conversion module, a detection module, a path display module and an information display module; the input module receives an input domain name, an IP address and a title with sensitive words; the image conversion module receives the title with the sensitive words output by the input module, and converts the title into an image according to different fonts; the detection module detects the website according to the domain name, the IP address, the title and the picture; the path display module displays the received file path containing the title or the picture of the website with the sensitive information; the information display module displays the received domain name and IP address of the website with the sensitive information and the information with the title or the picture in the website.
Description
Technical Field
The invention relates to the technical field of website information maintenance, in particular to a website sensitive word detection system and method.
Background
More and more 'ambiguous sensitive words' appear in the website at present, in most websites, the sensitive words generally refer to words with sensitive political tendency (or anti-political party tendency), violence tendency and unhealthy colors or uncivilized words, and some websites set some special sensitive words only suitable for the website according to the actual conditions of the websites. Many websites are blocked due to sensitive words, which causes economic loss. Or some hackers input some sensitive words by using the bullet box, and many tourists browsing websites see the sensitive words by operating the trigger bullet box, so that social public opinion is caused or the social order is influenced, and the legal responsibility is great.
In the prior art, reference may be made to the chinese patent application publication No. CN110750981A, which discloses a high-accuracy website sensitive word detection method based on machine learning, in which a document to be detected is first subjected to rule matching with a sensitive word database to obtain a document set containing sensitive words, training data is processed and learned to output a machine learning model, and then the document set is input into the model to obtain a website sensitive word detection result. The method combines a machine learning algorithm to train the model, firstly performs sensitive word rule matching on the crawled website page, then performs machine learning automatic analysis on the output website after rule matching, reduces the data volume predicted by the machine learning model, improves the detection speed and accuracy, and finally obtains the possibility that the page contains sensitive words through statistical calculation.
The above prior art solutions have the following drawbacks: although a method for detecting and intercepting website sensitive words exists at present, the way for maliciously sending the sensitive words is various at present, and all the sensitive words are difficult to remove by simply depending on the existing language recognition.
Disclosure of Invention
The invention aims to provide a website sensitive word detection method, which can detect sensitive words sent in formats such as pictures and the like, effectively increase the removal accuracy of the sensitive words and enlarge the detection range of the sensitive words.
The technical purpose of the invention is realized by the following technical scheme:
a website sensitive word detection method comprises the following steps:
firstly, inputting a domain name and an IP address to be detected, and filling a title with sensitive words to be detected;
secondly, converting the title into an image according to different fonts;
thirdly, detecting a title and a picture in a website corresponding to the domain name and the IP address;
fourthly, displaying the detected file path containing the title or the picture of the website;
and fifthly, displaying the domain name with the sensitive information, the IP address and the information with the title or the picture in the domain name.
By adopting the scheme, after the user inputs the sensitive words and the detection range to be detected, the method can automatically display the information with the sensitive words and the file path for the user, so that the user can conveniently process the information with the sensitive words, and when the detection is carried out, the information with the formats of pictures, dynamic pictures, videos and the like which are maliciously uploaded by others can be searched through the sensitive words with the picture format, so that the detection range of the sensitive words is effectively enlarged, and the elimination accuracy of the sensitive words is increased.
The invention is further configured to: further comprising:
adding blank characters among all characters in the sensitive words;
and thirdly, detecting the sensitive words after adding the blank characters in the website corresponding to the domain name and the IP address, wherein the blank characters are any characters during searching.
By adopting the scheme, the detection range of the sensitive words can be further expanded, and the situation that people divide the sensitive words by simple characters such as blank spaces and the like to avoid detection is avoided.
The invention is further configured to: further comprising:
and sixthly, generating data from the displayed information, establishing a document, and storing the data into the specified document.
By adopting the scheme, the user can check the information in the document at any time so as to ensure that the user can process the information with the sensitive words at convenient time when the user does not have time to process the information or is not in time to process the information.
The invention is further configured to: further comprising:
and seventhly, after the document is opened, selecting any domain name with sensitive information, IP address and information that the title of the domain name contains sensitive words, and automatically inquiring a corresponding file path according to the selected information.
By adopting the scheme, the user can directly know the file path when selecting the information with the sensitive words, and the user can conveniently process the information.
The invention is further configured to: further comprising:
and thirdly, setting the blurring degree of the detected picture before detecting the picture, wherein the higher the blurring degree is, the larger the detected picture range is.
By adopting the scheme, the user can set the detection ambiguity according to the actual condition, the retrieval range of the information such as pictures can be controlled according to the actual condition, the detection accuracy is improved, and the information with sensitive words can be detected while the error detection of too much normal information is avoided as far as possible.
The invention aims to provide a website sensitive word detection system which can detect sensitive words sent in formats such as pictures and the like, effectively increase the removal accuracy of the sensitive words and enlarge the detection range of the sensitive words.
The technical purpose of the invention is realized by the following technical scheme:
a website sensitive word detection system comprises an input module, an image conversion module, a detection module, a path display module and an information display module;
the input module receives an input domain name, an IP address and a title with sensitive words and outputs the domain name, the IP address and the title with the sensitive words;
the image conversion module receives the title with the sensitive words output by the input module, converts the title into an image according to different fonts and outputs the image;
the detection module receives a domain name, an IP address and a title output by the input module and a picture output by the image conversion module, locks a website according to the received domain name and the IP address, detects information which is the same as or similar to the received title and the picture on the locked website, transmits a file path containing the title or the picture of the website with the detected sensitive information to the path display module, and transmits the domain name and the IP address of the website with the detected sensitive information and the information with the title or the picture in the website to the information display module;
the path display module displays the received file path containing the title or the picture of the website with the sensitive information;
and the information display module displays the received domain name and IP address of the website with the sensitive information and the information with the title or picture in the website.
By adopting the scheme, after the user inputs the sensitive words to be detected and the detection range into the system, the system can automatically detect the sensitive words in the set range, so that the user can conveniently process the information with the sensitive words, and during detection, the system can search the information in the formats of pictures, dynamic pictures, videos and the like maliciously uploaded by others through the sensitive words in the picture format, thereby effectively enlarging the detection range of the sensitive words and increasing the removal accuracy of the sensitive words.
The invention is further configured to: further comprising: the word processing module receives the title with the sensitive words output by the input module, adds blank characters among all the characters in the sensitive words to form suspected sensitive words and transmits the suspected sensitive words to the detection module;
the detection module detects the same information as the suspected sensitive words on the locked website, and blank characters are any characters during detection.
By adopting the scheme, the word processing module can further expand the detection range of the sensitive words, and avoids the situation that people divide the sensitive words by simple characters such as blank spaces and the like to avoid detection.
The invention is further configured to: further comprising: the storage module receives input information, establishes a document and stores the received information into the document, the path display module transmits a file path to the storage module for storage, and the information display module transmits the information to the storage module for storage.
By adopting the scheme, a user can check the information in the document at any time through the storage module so as to ensure that the user can process the information with the sensitive words at a convenient time when the user does not have time to process the information or is not in time to process the information.
The invention is further configured to: further comprising: and the automatic searching module calls the document stored in the storage module according to the received instruction and selects the information stored in the document, and the automatic searching module automatically inquires and displays the corresponding file path according to the selected information.
By adopting the scheme, the automatic searching module automatically searches the corresponding file path and displays the file path to the user when the user selects the information with the sensitive words after opening the document, so that the user can conveniently process the information.
The invention is further configured to: the method is characterized in that: the detection module receives information input from the outside and adjusts the blurring degree of the detected picture according to the input information, and the higher the blurring degree is, the larger the detected picture range is.
By adopting the scheme, the user can set the detection ambiguity at the detection module according to the actual condition, the detection accuracy can be improved by controlling the retrieval range of the information such as the picture and the like according to the actual condition, and the information with the sensitive words can be detected without mistakenly detecting too much normal information.
In conclusion, the invention has the following beneficial effects:
1. when the method and the system detect the sensitive words, the sensitive words in the picture format can be used for searching the information in the picture, dynamic graph, video and other formats uploaded maliciously by others, so that the detection range of the sensitive words is effectively enlarged, and the removal accuracy of the sensitive words is increased;
2. the word processing module can further expand the detection range of the sensitive words, and avoids the situation that people divide the sensitive words by simple characters such as blank spaces and the like to avoid detection.
Drawings
Fig. 1 is an overall system block diagram of the second embodiment.
In the figure, 1, an input module; 2. an image conversion module; 3. a word processing module; 4. a detection module; 5. a path display module; 6. an information display module; 7. a storage module; 8. and an automatic searching module.
Detailed Description
The first embodiment is as follows: a website sensitive word detection method comprises the following specific steps:
step one, inputting a domain name and an IP address to be detected, and filling a title with sensitive words to be detected.
And step two, converting the title into the picture according to different fonts. Blank characters are added between each word in the sensitive word.
And step three, setting the fuzziness of the detected picture by the user, wherein the higher the fuzziness is, the larger the detected picture range is. After the setting is finished, detecting a title, a picture and a sensitive word after adding a blank character in a website corresponding to a domain name and an IP address, wherein the blank character is an arbitrary character during searching.
Step four, displaying the file path containing the title or the picture of the detected website,
And step five, displaying the domain name with the sensitive information, the IP address and the information with the title or the picture in the domain name.
And step six, generating data from the displayed information, establishing a document, and storing the data into the specified document. The user can check the information in the document at any time to ensure that the user can process the information with the sensitive words at a convenient time when the user does not have time to process the information or is not in time to process the information.
And step seven, after the document is opened, selecting any domain name with sensitive information, IP address and information that the title of the domain name contains sensitive words, and automatically inquiring a corresponding file path according to the selected information. The user can directly know the file path when selecting the information with the sensitive words, and the user can conveniently process the information.
After the user inputs the sensitive words and the detection range to be detected, the method can automatically display the information with the sensitive words and the file path for the user, and is convenient for the user to process the information with the sensitive words. During detection, information in the formats of pictures, dynamic pictures, videos and the like which are maliciously uploaded by others can be searched through the sensitive words in the picture format, and the sensitive words can be prevented from being separated by simple characters such as blank spaces and the like to avoid detection by adding blank characters. The detection range of the sensitive words is effectively enlarged, and the elimination accuracy of the sensitive words is improved.
Example two: a website sensitive word detection system is shown in figure 1 and comprises an input module 1, an image conversion module 2, a word processing module 3, a detection module 4, a path display module 5, an information display module 6, a storage module 7 and an automatic search module 8.
As shown in fig. 1, the input module 1 receives an input domain name, an IP address, and a title with a sensitive word and outputs the domain name, the IP address, and the title with the sensitive word. The image conversion module 2 receives the title with the sensitive words output by the input module 1, and the image conversion module 2 converts the title into pictures according to different fonts and outputs the pictures. The word processing module 3 receives the title with the sensitive words output by the input module 1, the word processing module 3 adds blank characters between each word in the sensitive words to form suspected sensitive words, and transmits the suspected sensitive words to the detection module 4.
As shown in fig. 1, the detection module 4 receives externally input information and adjusts the blur degree of the detected picture according to the input information, and the higher the blur degree is, the larger the detected picture range is. The user can set the detection ambiguity at detection module 4 according to actual conditions, can improve the precision of detection through controlling the retrieval scope to information such as picture according to actual conditions, can guarantee as far as possible not to miss out too much normal information again when can detecting the information that has sensitive word.
As shown in fig. 1, the detection module 4 receives the domain name, the IP address and the title output by the input module 1, the picture output by the image conversion module 2, and the suspected sensitive word output by the word processing module 3. The detection module 4 locks the website according to the received domain name and the IP address, and detects information that is the same as or similar to the received title, picture, and suspected sensitive word on the locked website, where the blank character is an arbitrary character during detection. The detection module 4 transmits the file path containing the title or the picture of the website with the detected sensitive information to the path display module 5. The detection module 4 transmits the domain name and the IP address of the website with the detected sensitive information and the information with the title or the picture in the website to the information display module 6.
As shown in fig. 1, the path display module 5 displays a file path containing a title or a picture of the received website with the sensitive information. The information display module 6 displays the received domain name and IP address of the website with the sensitive information and the information with the title or picture in the website. After the user inputs the sensitive words and the detection range to be detected into the system, the system can automatically display the information with the sensitive words and the file path for the user, and the user can conveniently process the information with the sensitive words. During detection, information in the formats of pictures, dynamic pictures, videos and the like which are maliciously uploaded by others can be searched through the sensitive words in the picture format, and the sensitive words can be prevented from being separated by simple characters such as blank spaces and the like to avoid detection by adding blank characters. The system can effectively enlarge the detection range of the sensitive words and increase the elimination accuracy of the sensitive words.
As shown in fig. 1, the storage module 7 receives input information and creates a document, stores the received information in the document, the path display module 5 transmits a file path to the storage module 7 for storage, and the information display module 6 transmits the information to the storage module 7 for storage. The automatic searching module 8 calls the document stored in the storage module 7 according to the received instruction and selects the information stored in the document, and the automatic searching module 8 automatically inquires and displays the corresponding file path according to the selected information. When the user selects the information with the sensitive words after opening the document, the automatic searching module 8 automatically searches the corresponding file path and displays the file path to the user, so that the user can conveniently process the information. The user can set the detection ambiguity at detection module 4 according to actual conditions, can improve the precision of detection through controlling the retrieval scope to information such as picture according to actual conditions, can guarantee as far as possible not to miss out too much normal information again when can detecting the information that has sensitive word.
The embodiments of the present invention are preferred embodiments of the present invention, and the scope of the present invention is not limited by these embodiments, so: all equivalent changes made according to the structure, shape and principle of the invention are covered by the protection scope of the invention.
Claims (10)
1. A website sensitive word detection method is characterized by comprising the following steps:
firstly, inputting a domain name and an IP address to be detected, and filling a title with sensitive words to be detected;
secondly, converting the title into an image according to different fonts;
thirdly, detecting a title and a picture in a website corresponding to the domain name and the IP address;
fourthly, displaying the detected file path containing the title or the picture of the website;
and fifthly, displaying the domain name with the sensitive information, the IP address and the information with the title or the picture in the domain name.
2. The system and method for detecting website sensitive words according to claim 1, further comprising:
adding blank characters among all characters in the sensitive words;
and thirdly, detecting the sensitive words after adding the blank characters in the website corresponding to the domain name and the IP address, wherein the blank characters are any characters during searching.
3. The system and method for detecting website sensitive words according to claim 1, further comprising:
and sixthly, generating data from the displayed information, establishing a document, and storing the data into the specified document.
4. The system and method for detecting website sensitive words according to claim 3, further comprising:
and seventhly, after the document is opened, selecting any domain name with sensitive information, IP address and information that the title of the domain name contains sensitive words, and automatically inquiring a corresponding file path according to the selected information.
5. The system and method for detecting website sensitive words according to claim 1, further comprising:
and thirdly, setting the blurring degree of the detected picture before detecting the picture, wherein the higher the blurring degree is, the larger the detected picture range is.
6. A website sensitive word detection system is characterized in that: the system comprises an input module (1), an image conversion module (2), a detection module (4), a path display module (5) and an information display module (6);
the input module (1) receives an input domain name, an IP address and a title with sensitive words and outputs the domain name, the IP address and the title with the sensitive words;
the image conversion module (2) receives the title with the sensitive words output by the input module (1), and the image conversion module (2) converts the title into an image according to different fonts and outputs the image;
the detection module (4) receives a domain name, an IP address and a title output by the input module (1) and a picture output by the image conversion module (2), the detection module (4) locks a website according to the received domain name and the IP address and detects information which is the same as or similar to the received title and picture on the locked website, the detection module (4) transmits a file path containing the title or the picture of the detected website with sensitive information to the path display module (5), and the detection module (4) transmits the domain name and the IP address of the detected website with sensitive information and the information with the title or the picture in the website to the information display module (6);
the path display module (5) displays the received file path containing the title or the picture of the website with the sensitive information;
and the information display module (6) displays the received domain name and IP address of the website with the sensitive information and the information with the title or picture in the website.
7. The website sensitive word detection system of claim 6, further comprising: the word processing module (3) receives the title with the sensitive words output by the input module (1), and the word processing module (3) adds blank characters among all the characters in the sensitive words to form suspected sensitive words and transmits the suspected sensitive words to the detection module (4);
the detection module (4) detects the same information as the suspected sensitive words on the locked website, and blank characters are any characters during detection.
8. The website sensitive word detection system of claim 6, further comprising: the storage module (7) receives input information and establishes a document, the received information is stored in the document, the path display module (5) transmits a file path to the storage module (7) for storage, and the information display module (6) transmits the information to the storage module (7) for storage.
9. The website sensitive word detection system of claim 8, further comprising: the automatic searching module (8) calls the documents stored in the storage module (7) according to the received instructions and selects the information stored in the documents, and the automatic searching module (8) automatically inquires and displays the corresponding file path according to the selected information.
10. The website sensitive word detection system of claim 6, wherein: the detection module (4) receives information input from the outside and adjusts the blurring degree of the detected picture according to the input information, and the higher the blurring degree is, the larger the detected picture range is.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011454305.1A CN112417305A (en) | 2020-12-10 | 2020-12-10 | Website sensitive word detection system and method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011454305.1A CN112417305A (en) | 2020-12-10 | 2020-12-10 | Website sensitive word detection system and method |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN112417305A true CN112417305A (en) | 2021-02-26 |
Family
ID=74775907
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011454305.1A Pending CN112417305A (en) | 2020-12-10 | 2020-12-10 | Website sensitive word detection system and method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112417305A (en) |
-
2020
- 2020-12-10 CN CN202011454305.1A patent/CN112417305A/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6169998B1 (en) | Method of and a system for generating multiple-degreed database for images | |
| JP5226095B2 (en) | Local item extraction | |
| US12159452B2 (en) | Automatically predicting text in images | |
| US20090300003A1 (en) | Apparatus and method for supporting keyword input | |
| JPH07282088A (en) | Matching device and matching method | |
| KR100930249B1 (en) | Apparatus and method for searching the Internet using information obtained from images | |
| CN109272440B (en) | Thumbnail generation method and system combining text and image content | |
| WO2016057238A1 (en) | Linking thumbnail of image to web page | |
| JP2010509794A (en) | Improved mobile communication terminal | |
| CN111104028B (en) | Method, device, equipment and storage medium for topic determination | |
| US6535652B2 (en) | Image retrieval apparatus and method, and computer-readable memory therefor | |
| JP2011065255A (en) | Data processing apparatus, data name generation method and computer program | |
| CN112417305A (en) | Website sensitive word detection system and method | |
| JP7651962B2 (en) | Information processing device, information processing system, information processing method, and program | |
| JP4266240B1 (en) | Item judgment system and item judgment program | |
| KR19990016894A (en) | How to search video database | |
| CN109783735A (en) | Method and device for acquiring content based on user corpus | |
| US5361204A (en) | Searching for key bit-mapped image patterns | |
| CN115565193A (en) | Questionnaire information input method and device, electronic equipment and storage medium | |
| JP2007188410A (en) | Electronic dictionary device, electronic dictionary search method, and electronic dictionary program | |
| JPH08180064A (en) | Document retrieval method and document filing device | |
| KR100540735B1 (en) | Subtitle Character-based Image Indexing Method | |
| JP6425989B2 (en) | Character recognition support program, character recognition support method, and character recognition support device | |
| EP4379573A1 (en) | Computer implemented method for an automated search of an article of a printed medium | |
| JP3241854B2 (en) | Automatic word spelling correction device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210226 |