[go: up one dir, main page]

CN119232498A - Domain name identification library updating method, system, domain name identification method and server - Google Patents

Domain name identification library updating method, system, domain name identification method and server Download PDF

Info

Publication number
CN119232498A
CN119232498A CN202411747484.6A CN202411747484A CN119232498A CN 119232498 A CN119232498 A CN 119232498A CN 202411747484 A CN202411747484 A CN 202411747484A CN 119232498 A CN119232498 A CN 119232498A
Authority
CN
China
Prior art keywords
domain name
mapping
identified
domain
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411747484.6A
Other languages
Chinese (zh)
Inventor
陈立
王东泉
张俊安
马源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Maxnet Network Safety Technology Co ltd
Original Assignee
Suzhou Maxnet Network Safety Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Maxnet Network Safety Technology Co ltd filed Critical Suzhou Maxnet Network Safety Technology Co ltd
Priority to CN202411747484.6A priority Critical patent/CN119232498A/en
Publication of CN119232498A publication Critical patent/CN119232498A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0876Network architectures or network communication protocols for network security for authentication of entities based on the identity of the terminal or configuration, e.g. MAC address, hardware or software configuration or device fingerprint
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Power Engineering (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Machine Translation (AREA)

Abstract

本发明揭示了域名识别库更新方法、系统、域名识别方法及服务器,其中域名识别库更新方法在更新域名识别库时,是将待识别域名进行映射后与域名识别库进行比对,以确定待识别域名的映射域名与域名识别库中哪一种映射域名相同,继而可以根据匹配中的映射域名对应的域名的类型来确定待识别域名是否合法,能够有效地满足在进行域名识别库自动更新时,确定待识别域名合法与否的要求;且通过有序列表和映射字典进行域名的映射,能够快速的进行域名相似度的检测,不需要复杂的数学原理,便于开发,且不需要复杂的计算过程,计算效率高,处理速率快,能够满足大批量域名的识别需要,有利于提高域名识别库的更新速率。

The present invention discloses a domain name identification library updating method, system, domain name identification method and server. When updating the domain name identification library, the domain name to be identified is mapped and then compared with the domain name identification library to determine which mapping domain name of the domain name to be identified is the same as which mapping domain name in the domain name identification library. Then, whether the domain name to be identified is legal can be determined according to the type of domain name corresponding to the mapping domain name in the matching. The requirement of determining whether the domain name to be identified is legal can be effectively met when the domain name identification library is automatically updated. In addition, the domain names are mapped by an ordered list and a mapping dictionary, so that the domain name similarity can be detected quickly. Complex mathematical principles are not required, and development is convenient. Complex calculation processes are not required, and the calculation efficiency is high and the processing rate is fast. The identification needs of a large number of domain names can be met, which is conducive to improving the update rate of the domain name identification library.

Description

Domain name recognition library updating method and system, domain name recognition method and server
Technical Field
The invention relates to the technical field of network security, in particular to a domain name recognition library updating method, a domain name recognition system, a domain name recognition method and a server.
Background
The domain name is composed of two or more groups of ASCII or each country language characters, each character group is separated by periods, the rightmost character group is called a top-level domain name or a first-level domain name, the last-last character group is called a second-level domain name, the last-last character group is called a third-level domain name, and so on.
Top-level domain names are classified into three types, one being country and region top-level domain names, e.g., chinese is. Cn, japan is. Jp, two being general top-level domain names, e.g., representing business, com, representing network provider, net, and three being new top-level domain names, e.g., general.
On the internet, various illegal domain names or counterfeited domain names of other people exist frequently to achieve the illegal purpose, so that whether the domain name is legal or not is very important to detect.
In performing domain name identification, as shown in patent document CN112751804B, all or part of the domain name to be identified is usually compared with a determined domain name library to identify the domain name.
Therefore, the domain name library is the basis of accurate identification, and the data of the domain name library needs to be updated in time, but the existing updating method can only add the acquired new domain name into the domain name library, and can not automatically identify whether each newly added domain name is legal or not.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provides a domain name recognition library updating method, a domain name recognition library updating system, a domain name recognition method and a domain name recognition server.
The aim of the invention is achieved by the following technical scheme:
the updating method of the domain name recognition library comprises the following steps:
S1, acquiring a batch of domain names to be identified;
s2, identifying each domain name to be identified according to the following process;
S21, determining a mapping domain name to be identified corresponding to the domain name to be identified according to the determined ordered list and the mapping method;
S22, comparing the mapping domain name to be identified with the mapping domain name in the domain name identification library to determine whether the mapping domain name to be identified is matched with a mapping domain name in the domain name identification library, if not, executing S23, if so, executing S24;
S23, adding the domain name to be identified into a domain name library to be classified;
S24, determining whether the types of all the domains corresponding to the mapping domains in the matching are consistent, if so, executing S25, and if not, executing S26, wherein the type of one domain refers to whether the one domain is a legal domain or an illegal domain;
S25, adding the domain name to be identified into the domain name identification library, and determining that the type of the domain name to be identified is the type of all domain names corresponding to the mapping domain name in the matching;
s26, determining whether the duty ratio of legal domain names in all domain names corresponding to the mapping domain names in the matching reaches a first threshold value, if not, executing S27, if so, executing S28;
s27, adding the domain name to be identified into a domain name library to be classified;
S28, adding the domain name to be identified into the domain name identification library, determining that the type of the domain name to be identified is a legal domain name, and adding an in-doubt mark for the domain name to be identified;
And S3, after all the domains to be identified are identified, confirming the duty ratio of the in-doubt domain name corresponding to each mapping domain name in the domain name identification library, wherein the duty ratio of the in-doubt domain name corresponding to one mapping domain name is the duty ratio of the domain name with the in-doubt mark in all the domain names corresponding to the mapping domain name, and when the duty ratio of the in-doubt domain name corresponding to one mapping domain name is determined to exceed a second threshold value, transferring the domain name with the in-doubt mark in all the domain names corresponding to the mapping domain name from the domain name identification library to the domain name library to be classified, so as to obtain an updated domain name library to be classified and a domain name identification library.
Preferably, the ordered list comprises letters and/or numbers and/or symbols arranged in sequence, one and only one of each letter and/or number and/or symbol.
Preferably, determining the mapping domain name to be identified corresponding to the domain name to be identified according to the determined ordered list and the mapping method includes the following steps:
s211, establishing a mapping dictionary which is initially empty and updated according to a preset rule for the domain name to be identified;
S212, taking all or part of character groups in the domain name to be identified as domain names to be mapped, and mapping all the characters to be mapped except periods in the domain names to be mapped according to the ordered list and the mapping dictionary in order from left to right or from right to left to obtain target mapping characters of all the characters to be mapped;
s213, mapping all the characters to be mapped in the domain name to be mapped into corresponding target mapping characters, and combining the corresponding target mapping characters with original periods in the domain name to be identified according to the original sequence of the periods in the domain name to be identified to obtain the domain name to be identified.
Preferably, in the step S212, the top-level domain name in the domain name to be identified is removed and then used as the domain name to be mapped;
In S213, the to-be-mapped characters in the to-be-mapped domain name are all mapped into corresponding target mapping characters, and then combined with the original top-level domain name and period in the to-be-identified domain name according to their original sequence in the to-be-identified domain name to obtain the to-be-identified mapped domain name.
Preferably, before mapping, it is determined whether the number of character types of the domain name to be mapped exceeds the number of characters in the ordered list, if so, the domain name to be mapped is cut so that the number of character types of the domain name to be mapped after cutting does not exceed the number of characters in the ordered list to be the same.
Preferably, in the step S212, when mapping each character to be mapped, it is first determined whether there is a mapping relationship corresponding to the character to be mapped in the mapping dictionary;
if yes, mapping the character to be mapped into a corresponding target mapping character according to the mapping relation;
If not, determining what character appears for the first time in the domain name to be mapped, selecting the character with the corresponding order from the ordered list as the target mapping character of the character to be mapped according to the character, mapping the character to be mapped into the target mapping character, and storing the mapping relation between the character to be mapped and the target mapping character into a mapping dictionary.
Preferably, in the step S3, after obtaining the updated domain name library to be classified, a second recognition method is adopted to determine the type of each domain name in the domain name library to be classified;
adding the domain name with the determined type in the domain name library to be classified into the domain name identification library;
And (3) reserving the domain names of which the types are not identified in the domain name library to be classified for next comparison with the updated domain name identification library.
A domain name recognition library update system comprising:
the domain name acquisition unit is used for acquiring a batch of domain names to be identified;
the identifying unit is used for identifying each domain name to be identified, and comprises the following steps:
the mapping module is used for determining a mapping domain name to be identified corresponding to the domain name to be identified according to the determined ordered list and the mapping method;
The matching module is used for comparing the mapping domain name to be identified with the mapping domain name in the domain name identification library to determine whether the mapping domain name to be identified is matched with a mapping domain name in the domain name identification library, if not, the mapping domain name is communicated with the first classification module, and if so, the mapping domain name to be identified is communicated with the second classification module;
The first classification module is used for adding the domain name to be identified into a domain name library to be classified;
the second classification module is used for determining whether the types of all the domain names corresponding to the mapping domain names in the matching are consistent, if so, the second classification module is communicated with the third classification module, and if not, the second classification module is communicated with the fourth classification module;
The third classification module is used for adding the domain name to be identified into the domain name identification library and determining that the type of the domain name to be identified is the type of all domain names corresponding to the mapping domain name in the matching;
The fourth classification module is used for determining whether the duty ratio of legal domain names in all domain names corresponding to the mapping domain names in the matching reaches a first threshold value, and if not, communicating with the fifth classification module;
A fifth classification module, configured to add the domain name to be identified to a domain name library to be classified;
A sixth classification module, configured to add the domain name to be identified to the domain name identification library, determine that the type of the domain name to be identified is a legal domain name, and add an in-doubt flag to the domain name to be identified;
And when the ratio of the in-doubt domain name corresponding to one mapping domain name is determined to exceed a second threshold, transferring the domain name with the in-doubt mark from the domain name recognition library to the domain name library to be classified, and obtaining an updated domain name library to be classified and a domain name recognition library.
The domain name identification method is used for carrying out domain name identification of user access by using the domain name identification library obtained by the domain name identification library updating method.
A server comprising a memory and a processor, the memory storing a program executable by the processor, the program when executed implementing a domain name identification library updating method as described in any one of the above and/or a domain name identification method as described above.
The technical scheme of the invention has the advantages that:
The invention can identify the naming mode of the domain name by mapping the domain name through the ordered list and the mapping dictionary, and can distinguish illegal domain names from legal domain names by using the mapping domain names, thereby when updating the domain name identification library, the domain name to be identified can be compared with the domain name identification library after being mapped so as to determine which mapping domain name in the domain name identification library is the same as the mapping domain name to be identified, then whether the domain name to be identified is legal or not can be determined according to the type of the domain name corresponding to the mapping domain name in the matching, the requirements of determining whether the domain name to be identified is legal or not when automatically updating the domain name identification library can be effectively met, and the domain name similarity can be rapidly detected by mapping the ordered list and the mapping dictionary without complex mathematical principle, thereby being convenient for development, having no complex calculation process, having high calculation efficiency and high processing speed, being capable of meeting the identification requirement of a large quantity of domain names and being beneficial to improving the updating speed of the domain name identification library.
When the domain name recognition library is used for recognizing the domain names accessed by the users, the domain names of the same kind can be quickly found by mapping the domain names, so that whether the domain names accessed by the users are legal or not can be quickly determined, corresponding access control is performed, and the processing rate is greatly improved.
Drawings
FIG. 1 is a process diagram of a domain name recognition library update method of the present invention;
FIG. 2 is a schematic diagram of a process of mapping a domain name to be identified to obtain a mapped domain name to be identified in the present invention;
FIG. 3 is a schematic diagram of a process for mapping characters to be mapped and updating a mapping dictionary according to the present invention;
fig. 4 is a schematic diagram of a process of mapping a part of character sets in a domain name to be identified after removing to obtain a mapped domain name to be identified in the present invention.
Detailed Description
The objects, advantages and features of the present invention are illustrated and explained by the following non-limiting description of preferred embodiments. These embodiments are only typical examples of the technical scheme of the invention, and all technical schemes formed by adopting equivalent substitution or equivalent transformation fall within the scope of the invention.
In the description of the embodiments, it should be noted that the positional or positional relationship indicated by the terms such as "center", "upper", "lower", "left", "right", "front", "rear", "vertical", "horizontal", "inner", "outer", etc. are based on the positional or positional relationship shown in the drawings, are merely for convenience of description and simplification of description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be configured and operated in the specific orientation, and thus are not to be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Example 1
The method for updating domain name recognition library disclosed by the invention is described below with reference to the accompanying drawings, and as shown in fig. 1, the method comprises the following steps:
S1, acquiring a batch of domain names to be identified;
s2, identifying each domain name to be identified according to the following process;
S21, determining a mapping domain name to be identified corresponding to the domain name to be identified according to the determined ordered list and the mapping method;
S22, comparing the mapping domain name to be identified with a mapping domain name in a domain name identification library to determine whether the mapping domain name to be identified is matched with a mapping domain name in the domain name identification library, if not, executing S23, and if so, executing S24, wherein the type of the domain name is legal or illegal;
S23, adding the domain name to be identified into a domain name library to be classified;
S24, determining whether the types of all the domain names corresponding to the mapping domain names in the matching are consistent, if so, executing S25, and if not, executing S26;
S25, adding the domain name to be identified into the domain name identification library, and determining that the type of the domain name to be identified is the type of all domain names corresponding to the mapping domain name in the matching;
s26, determining whether the duty ratio of legal domain names in all domain names corresponding to the mapping domain names in the matching reaches a first threshold value, if not, executing S27, if so, executing S28;
s27, adding the domain name to be identified into a domain name library to be classified;
S28, adding the domain name to be identified into the domain name identification library, determining that the type of the domain name to be identified is a legal domain name, and adding an in-doubt mark for the domain name to be identified;
And S3, after all the domains to be identified are identified, confirming the duty ratio of the in-doubt domain name corresponding to each mapping domain name in the domain name identification library, wherein the duty ratio of the in-doubt domain name corresponding to one mapping domain name is the duty ratio of the domain name with the in-doubt mark in all the domain names corresponding to the mapping domain name, and when the duty ratio of the in-doubt domain name corresponding to one mapping domain name is determined to exceed a second threshold value, transferring the domain name with the in-doubt mark in all the domain names corresponding to the mapping domain name from the domain name identification library to the domain name library to be classified, so as to obtain an updated domain name library to be classified and a domain name identification library.
In the step S1, the domain name to be identified may be obtained from the obtained traffic data according to a certain period, for example, the domain name to be identified is obtained from the traffic data obtained in one day at daily timing, for example, when the domain name to be identified is 24 points per day, a batch of domain names to be identified is obtained from the traffic data in one day. When the domain name to be identified is obtained, a packet grabbing tool can be arranged on the gateway, the packet grabbing tool can grab network message data on the gateway and can analyze the domain name information from the message data, and corresponding domain name information can be obtained when various internet surfing operations are carried out under the gateway, namely, the domain name to be identified is obtained. The domain name to be identified can also be a domain name stored in a domain name library to be classified, after the domain name identification library is updated and enlarged for a preset number of times, the domain name in the domain name library to be classified can be compared with the domain name in the domain name library to be classified again through the updated domain name identification library, so that the domain name in the domain name library to be classified can be added into the domain name identification library.
In S21, the determined ordered list includes letters and/or numbers and/or symbols arranged in order, the letters may be uppercase, lowercase, greek, etc. of 26 english letters, the numbers may be arabic numerals, roman numerals, chinese numerals, etc., the symbols may be mathematical symbols, punctuation marks, etc., and each letter and/or number and/or symbol is one and only one.
For example, the ordered list is defined as follows:
[a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,0,1,2,3,4,5,6,7,8,9]。
As shown in fig. 2, determining the mapping domain name to be identified corresponding to the domain name to be identified according to the determined ordered list and the mapping method includes the following processes:
s211, establishing a mapping dictionary which is initially empty and updated according to a preset rule for the domain name to be identified;
S212, taking all or part of character groups in the domain name to be identified as domain names to be mapped, and mapping all the characters to be mapped except for periods in the domain names to be mapped according to the ordered list and the mapping dictionary in order from left to right or from right to left to obtain target mapping characters of all the characters to be mapped;
As shown in fig. 3, when mapping each character to be mapped in sequence, determining whether a mapping relation corresponding to the character to be mapped exists in the mapping dictionary;
if yes, mapping the character to be mapped into a corresponding target mapping character according to the mapping relation;
If not, determining what character appears for the first time in the domain name to be mapped, selecting the character with the corresponding order from the ordered list as the target mapping character of the character to be mapped according to the character, mapping the character to be mapped into the target mapping character, and storing the mapping relation between the character to be mapped and the target mapping character into a mapping dictionary.
S213, mapping all the characters to be mapped in the domain name to be mapped into corresponding target mapping characters, and combining the corresponding target mapping characters with original periods in the domain name to be identified according to the original sequence of the periods in the domain name to be identified to obtain the domain name to be identified.
As shown in fig. 4, in order to reduce the mapping workload, in S212, the top-level domain name in the domain name to be identified is removed and then used as the domain name to be mapped;
In S213, the to-be-mapped characters in the to-be-mapped domain name are all mapped into corresponding target mapping characters, and then combined with the original top-level domain name and period in the to-be-identified domain name according to their original sequence in the to-be-identified domain name to obtain the to-be-identified mapped domain name.
For example, a domain name to be identified is defined as weixin. Qq.com. An empty mapping dictionary is defined for the domain name to be identified, and the mapping dictionary corresponding to the domain name to be identified is updated according to the ordered list of the example. And after the top-level domain name in the domain name to be identified is removed, determining that the corresponding domain name to be mapped is weixin.
When mapping is started, determining that the first character to be mapped of the domain name to be mapped is 'w', wherein the character to be mapped does not have a corresponding mapping relation in a corresponding mapping dictionary { }, therefore, determining that the character to be mapped 'w' is the first character to be mapped in the domain name to be mapped, determining that the character 'w' is the first character to be mapped according to the domain name to be mapped, selecting the first character 'a' from the ordered list as a target mapping character of the first character 'a', thereby establishing a mapping relation between the character 'w' to be mapped and the target mapping character 'a', and storing the mapping relation in the mapping dictionary { } to obtain an updated mapping dictionary { w: a }. The second character to be mapped is 'e', the character to be mapped does not have mapping relation in the updated mapping dictionary, because the character to be mapped is the character to be mapped which appears for the first time in the domain name to be mapped, the second character 'b' is selected from the ordered list as the target mapping character of the character to be mapped as 'e', thereby determining the mapping relation of the character to be mapped as 'e' and the target mapping character 'b', the mapping relation is stored in the mapping dictionary to obtain the updated mapping dictionary { w: a, e: b }, and the like, the mapping dictionary finally updated by the domain name to be mapped is { w: a, e: b, i: c, x: d, n: e, q: f }, the mapped character is abcdceff, and the domain name to be identified is obtained by combining the mapped character with the original sentence point and the top-level domain name according to the original sequence in the domain name to be identified.
Further, in order to reduce the calculation amount, the number of characters in the ordered list is limited, for example, 36 characters in the above example are preferable, and correspondingly, if the number of character types of one domain name to be mapped exceeds 36, the number of characters exceeding the number of characters in the ordered list cannot be mapped accurately, so in S212, before mapping, it is determined whether the number of character types of the domain name to be mapped exceeds the number of characters in the ordered list, if yes, the domain name to be mapped is cut so that the number of character types of the cut domain name to be mapped does not exceed the number of characters in the ordered list, i.e. if the number of character types of the domain name to be mapped exceeds 36, the domain name to be mapped needs to be cut so that the number of character types of the domain name to be mapped does not exceed 36, and when the domain name to be mapped is cut, the second-level domain name and the third-level domain name are preferentially reserved. This is because, in general, the secondary domain name and the tertiary domain name represent companies to which the domain name belongs, and the domain name specification modes of the individual companies are generally consistent and the corresponding types should be consistent, so that it is beneficial to ensure a certain accuracy to keep the secondary domain name and the tertiary domain name.
In the S22, in the initial state, the domain name stored in the domain name identification library is a domain name that has been manually confirmed, that is, each domain name in the domain name identification library has been manually accessed and the type of the domain name is determined, where the type of the domain name is a legal domain name or an illegal domain name, and the legal domain name is a domain name whose content meets legal requirements, and the illegal domain name is a domain name whose content does not meet legal requirements.
The domain names stored in the domain name recognition library comprise legal domain names and illegal domain names, meanwhile, each domain name in the domain name recognition library is mapped to obtain a corresponding mapping domain name according to the ordered list and the mapping method, each domain name corresponds to one mapping domain name, and one mapping domain name possibly corresponds to a plurality of different domain names.
The information stored in the domain name identification library comprises domain names, mapped domain names corresponding to the domain names, types and marking information.
Further, the domain name identification library may further store legal or not information corresponding to each mapped domain name, where the legal or not information is determined according to types of all domain names corresponding to the mapped domain name, for example:
and if all the domain names corresponding to the mapping domain names are legal domain names, the legal or illegal domain names corresponding to the mapping domain names are legal domain names, and if the legal or illegal domain names corresponding to the mapping domain names are illegal domain names, the legal or illegal domain names corresponding to the mapping domain names are illegal domain names.
If all the domain names corresponding to one mapping domain name have legal domain names and illegal domain names, the legal or illegal information corresponding to the mapping domain name is the duty ratio of the legal domain names and/or the duty ratio of the illegal domain names.
Meanwhile, each mapping domain name can be divided into different domain name identification libraries according to legal or illegal information of each mapping domain name, for example, if the legal or illegal information corresponding to the mapping domain name is illegal, all the mapping domain names and the domain names corresponding to the mapping domain names are stored in one illegal domain name identification library, so that the domain names in the illegal domain name identification library are illegal. If the legal or not information corresponding to the mapped domain name is legal, the mapped domain name and the domain name corresponding to the mapped domain name are all stored in a legal domain name identification library, so that the domain names in the legal domain name identification library are legal. If the legal or illegal domain name information corresponding to the mapped domain name is the percentage of legal domain names and/or the percentage of illegal domain names, the mapped domain names and the domain names corresponding to the mapped domain names are stored in a suspicious domain name identification library.
And when the domain name is identified by the user, firstly, the mapping domain name to be identified corresponding to the domain name to be identified is matched with the illegal domain name identification library, if the mapping domain name is not matched with the suspicious domain name identification library, and if the mapping domain name is still not matched with the legal domain name identification library, whether the domain name accessed by the user is the illegal domain name or not can be determined more quickly.
In the following, an example will be described how to determine whether a domain name to be identified matches a mapped domain name in a domain name identification library, for example, a mapped domain name to be identified of a domain name to be identified hhq789. Com is aabbde.com, and a mapped domain name of a domain name yyl 456.Com stored in the domain name identification library is aabbde.com, that is, a mapped domain name stored in the domain name identification library matches the mapped domain name to be identified, so that a mapped domain name to be identified of a domain name to be identified hhq789. Com matches a mapped domain name in the domain name identification library.
As another example, the domain name to be identified of the domain name to be identified weixin, qq, com is abcdce, ff, com, and if the domain name identification library does not have the same mapping domain name as the domain name to be mapped abcdce, ff, com, then determining that the domain name to be mapped abcdce, ff, com is not matched with the mapping domain name in the domain name identification library.
In S23, when the domain name to be identified is added to the domain name library to be classified, the domain name to be identified and the mapping domain name corresponding to the domain name to be identified are added to the domain name library to be classified.
In S24, it is determined whether the types of all the domains corresponding to the mapped domains in the matching are consistent, that is, it is determined that the types of all the domains corresponding to the mapped domains in the matching are those of:
(1) Are legal domain names;
(2) Are illegal domain names;
(3) Both legitimate and illegitimate domain names.
If the types of all the domain names corresponding to the mapping domain names in the matching are legal domain names or illegal domain names, determining that the types of all the domain names corresponding to the mapping domain names in the matching are consistent, otherwise, if the types of all the domain names corresponding to the mapping domain names in the matching are not consistent, determining that the types of all the domain names corresponding to the mapping domain names in the matching are not consistent.
Correspondingly, in S25, the type of the domain name to be identified is determined to be a legal domain name if all the domain names corresponding to the mapping domain names in the domain name matching to be identified are legal domain names, and the type of the domain name to be identified is determined to be an illegal domain name if all the domain names corresponding to the mapping domain names in the domain name matching to be identified are illegal domain names. When the domain name to be identified is added into the domain name identification library, the domain name to be identified, the mapping domain name and the type are added into the domain name identification library.
In S26, the first threshold may be set as required, for example, the first threshold is not less than 70%, and more preferably not less than 80%.
In S28, it is determined that the type of the domain name to be identified is a legal domain name, and an in-doubt flag is added to the domain name to be identified, which indicates that the type determined for the domain name to be identified is in doubt, and subsequent observation and determination are required. Correspondingly, the domain name identification library also stores marking information corresponding to each domain name, the marking information can distinguish different information through different symbols or characters, for example, 0 and 1 are used for representing whether the domain name is in doubt, specifically, when the type of the domain name to be identified is determined to be in doubt, the marking information corresponding to the domain name to be identified is 1, at the moment, the doubt marking is added for the domain name to be identified, otherwise, when the type of the domain name to be identified is determined to be in doubt, the marking information corresponding to the domain name to be identified is 0.
And in the step S3, sequentially confirming whether the duty ratio of the domain name with the doubtful mark exceeds a second threshold value in all domain names corresponding to each mapping domain name in the domain name identification library, namely checking the duty ratio of the domain name with the mark information of 1 in the domain name corresponding to each mapping domain name in the domain name identification library. The second threshold may be designed according to needs, for example, the second threshold may be a value between 10% and 20%, for example, 20%, if the ratio of the domain name with the mark information of 1 in the domain name corresponding to one mapping domain name in the domain name identification library exceeds 20%, the domain name with the mark information of 1 in the domain name corresponding to the mapping domain name and related information thereof are moved out of the domain name identification library and added into the domain name library to be classified.
After the steps are completed, an updated domain name recognition library and a domain name library to be classified are obtained.
Further, a second identification method is adopted to identify the domain name in the domain name library to be classified, for example, the second identification method can be manual detection, namely, related domain names are manually accessed manually, page contents are checked to determine the corresponding type, and if the type can be determined, the domain name, the mapped domain name corresponding to the domain name and the type are added into the domain name identification library. Otherwise, the domain name is reserved in the domain name library to be classified for waiting for secondary comparison.
The second recognition method can also determine the type of the domain name by crawling web page contents, converting the crawled web page contents into vector data and inputting the vector data into a trained classification model. The method of training a classification model by crawling web page content and classifying by the classification model is known technology, for example, in the case of knowing a domain name, a script may be written by programming language such as python, and when the script is executed, the script simulates an operation of manually accessing a website corresponding to the domain name and storing the web page content, after crawling the web page content, information such as text content, ip geographic position and the like crawled from the web page is converted into vector data, and then model training is performed on the classification model in the existing model library in python to obtain a classification model, for example, the classification model may be FastText model, and the trained and converged classification model can be used for classification.
And, the domain names in the domain name library to be classified and the updated domain name identification library can be compared according to a certain period, for example, once per week.
Example 2
The embodiment discloses a domain name recognition library updating system, which comprises:
the domain name acquisition unit is used for acquiring a batch of domain names to be identified;
the identifying unit is used for identifying each domain name to be identified, and comprises the following steps:
the mapping module is used for determining a mapping domain name to be identified corresponding to the domain name to be identified according to the determined ordered list and the mapping method;
The matching module is used for comparing the mapping domain name to be identified with the mapping domain name in the domain name identification library to determine whether the mapping domain name to be identified is matched with a mapping domain name in the domain name identification library, if not, the mapping domain name is communicated with the first classification module, and if so, the mapping domain name to be identified is communicated with the second classification module;
The first classification module is used for adding the domain name to be identified into a domain name library to be classified;
the second classification module is used for determining whether the types of all domain names corresponding to the mapping domain names in the matching are consistent, if so, the second classification module is communicated with the third classification module, and if not, the second classification module is communicated with the fourth classification module;
The third classification module is used for adding the domain name to be identified into the domain name identification library and determining that the type of the domain name to be identified is the type of all domain names corresponding to the mapping domain name in the matching;
The fourth classification module is used for determining whether the duty ratio of legal domain names in all domain names corresponding to the mapping domain names in the matching reaches a first threshold value, and if not, communicating with the fifth classification module;
A fifth classification module, configured to add the domain name to be identified to a domain name library to be classified;
A sixth classification module, configured to add the domain name to be identified to the domain name identification library, determine that the type of the domain name to be identified is a legal domain name, and add an in-doubt flag to the domain name to be identified;
And when the ratio of the in-doubt domain name corresponding to one mapping domain name is determined to exceed a second threshold, transferring the domain name with the in-doubt mark from the domain name recognition library to the domain name library to be classified, and obtaining an updated domain name library to be classified and a domain name recognition library.
Example 3
The embodiment discloses a domain name recognition method, which uses a domain name recognition library obtained by the domain name recognition library updating method to recognize the domain name accessed by a user.
And then, comparing the mapping domain name to be identified with the mapping domain name in the domain name identification library to determine whether the mapping domain name to be identified is matched with a mapping domain name in the domain name identification library, if not, determining whether the domain name accessed by the user is legal by adopting other known domain name identification methods, for example, adopting the classification model to determine the type of the domain name. If so, determining whether the domain name accessed by the user is legal or not according to the situation of the domain name corresponding to the mapping domain name in the matching and/or the legal or illegal information of the mapping domain name in the matching.
When determining whether the domain name accessed by the user is legal or not according to the situation of the domain name corresponding to the mapping domain name in the matching, determining according to the following process:
Determining the type of the domain name corresponding to the mapped domain name in the matching,
If all the domain names corresponding to the mapping domain names in the matching are legal domain names, determining that the domain names accessed by the user are legal domain names;
if all the domain names corresponding to the mapping domain names in the matching are illegal domain names, determining that the domain names accessed by the user are illegal domain names;
If the fact that the domain name corresponding to the mapping domain name in the matching is not the legal domain name is determined, the domain name accessed by the user can be determined to be the legal domain name, if the fact that the domain name accessed by the user is not the legal domain name is determined to be the legal domain name, the domain name accessed by the user can be determined to be the legal domain name. For example, when the ratio of legal domain names in the case of the domain names corresponding to the mapped domain names in the matching is greater than 80%, determining that the domain names accessed by the user are legal domain names. Otherwise, determining the domain name accessed by the user as an illegal domain name.
Or if the web page content is not matched with all legal domain names in the mapping domain names in the matching, the type of the domain name accessed by the user can be determined by crawling the web page content and classifying the web page content through a classification model.
When determining whether the domain name accessed by the user is legal or not according to the legal or not information corresponding to the mapping domain name in the matching, determining according to the following process:
if the legal or not information corresponding to the mapping domain name in the matching is determined to be legal, determining the domain name accessed by the user as a legal domain name;
if the legal or illegal information corresponding to the mapping domain name in the matching is determined, determining the domain name accessed by the user as an illegal domain name;
If the legal or illegal domain name corresponding to the mapped domain name is determined to be the legal domain name percentage and/or illegal domain name percentage, the domain name accessed by the user can be matched with all illegal domain names corresponding to the mapped domain name in the matching, if the domain name accessed by the user is the illegal domain name in the matching, if the domain name accessed by the user is not matched, the domain name accessed by the user is matched with all legal domain names corresponding to the mapped domain name in the matching, if the domain name accessed by the user is determined to be the legal domain name in the matching, if the domain name accessed by the user is not matched, the domain name accessed by the user is determined to be legal according to the legal domain name percentage and/or illegal domain name percentage, for example, the domain name accessed by the user is determined to be the legal domain name in the matching, otherwise, the domain name accessed by the user is determined to be the illegal domain name.
This may be more efficient. Of course, the identification can be performed in both ways to mutually verify, thereby ensuring accuracy.
And if the domain name accessed by the user is determined to be a legal domain name, allowing the user to access, otherwise, stopping the user from accessing the domain name by means of dns interception and the like and sending out a corresponding prompt.
By adopting the method, the illegal domain names can be rapidly identified, because most illegal domain names adopt fixed naming modes, the naming modes of the illegal domain names can be rapidly identified through the ordered list and the mapping method, for example, the assumption is made that the domain name identification library stores the domain name hhq789. Com of the example, which is the illegal domain name, and the corresponding mapping domain name is aabbcde.com, so if the domain name accessed by the user is yyl 456.Com, and the corresponding mapping domain name to be identified is aabbcde.com, after the mapping domain name to be identified is matched with the domain name identification library, the domain name yyl 456.Com accessed by the user can be rapidly confirmed to be the illegal domain name.
Example 4
The embodiment discloses a server, which comprises a memory and a processor, wherein the memory stores a program which can be executed by the processor, and the program realizes the domain name identification library updating method when being executed.
The invention has various embodiments, and all technical schemes formed by equivalent transformation or equivalent transformation fall within the protection scope of the invention.

Claims (10)

1. The method for updating the domain name recognition library is characterized by comprising the following steps of:
S1, acquiring a batch of domain names to be identified;
s2, identifying each domain name to be identified according to the following process;
S21, determining a mapping domain name to be identified corresponding to the domain name to be identified according to the determined ordered list and the mapping method;
S22, comparing the mapping domain name to be identified with the mapping domain name in the domain name identification library to determine whether the mapping domain name to be identified is matched with a mapping domain name in the domain name identification library, if not, executing S23, if so, executing S24;
S23, adding the domain name to be identified into a domain name library to be classified;
S24, determining whether the types of all the domains corresponding to the mapping domains in the matching are consistent, if so, executing S25, and if not, executing S26, wherein the type of one domain refers to whether the one domain is a legal domain or an illegal domain;
S25, adding the domain name to be identified into the domain name identification library, and determining that the type of the domain name to be identified is the type of all domain names corresponding to the mapping domain name in the matching;
s26, determining whether the duty ratio of legal domain names in all domain names corresponding to the mapping domain names in the matching reaches a first threshold value, if not, executing S27, if so, executing S28;
s27, adding the domain name to be identified into a domain name library to be classified;
S28, adding the domain name to be identified into the domain name identification library, determining that the type of the domain name to be identified is a legal domain name, and adding an in-doubt mark for the domain name to be identified;
And S3, after all the domains to be identified are identified, confirming the duty ratio of the in-doubt domain name corresponding to each mapping domain name in the domain name identification library, wherein the duty ratio of the in-doubt domain name corresponding to one mapping domain name is the duty ratio of the domain name with the in-doubt mark in all the domain names corresponding to the mapping domain name, and when the duty ratio of the in-doubt domain name corresponding to one mapping domain name is determined to exceed a second threshold value, transferring the domain name with the in-doubt mark in all the domain names corresponding to the mapping domain name from the domain name identification library to the domain name library to be classified, so as to obtain an updated domain name library to be classified and a domain name identification library.
2. The method of claim 1, wherein the ordered list comprises letters and/or numbers and/or symbols arranged in sequence, one and only one of each letter and/or number and/or symbol.
3. The method for updating a domain name recognition library according to claim 1, wherein determining a domain name to be recognized corresponding to the domain name to be recognized according to the determined ordered list and the mapping method comprises the following steps:
s211, establishing a mapping dictionary which is initially empty and updated according to a preset rule for the domain name to be identified;
S212, taking all or part of character groups in the domain name to be identified as domain names to be mapped, and mapping all the characters to be mapped except periods in the domain names to be mapped according to the ordered list and the mapping dictionary in order from left to right or from right to left to obtain target mapping characters of all the characters to be mapped;
s213, mapping all the characters to be mapped in the domain name to be mapped into corresponding target mapping characters, and combining the corresponding target mapping characters with original periods in the domain name to be identified according to the original sequence of the periods in the domain name to be identified to obtain the domain name to be identified.
4. A domain name recognition base updating method according to claim 3, wherein:
in the step S212, the top-level domain name in the domain name to be identified is removed and then used as the domain name to be mapped;
In S213, the to-be-mapped characters in the to-be-mapped domain name are all mapped into corresponding target mapping characters, and then combined with the original top-level domain name and period in the to-be-identified domain name according to their original sequence in the to-be-identified domain name to obtain the to-be-identified mapped domain name.
5. The method of claim 3, wherein before mapping, determining whether the number of character types of the domain name to be mapped exceeds the number of characters in the ordered list, and if so, cutting the domain name to be mapped so that the number of character types of the domain name to be mapped after cutting does not exceed the number of characters in the ordered list to be the same.
6. The method for updating a domain name recognition library according to claim 3, wherein in the step S212, when each character to be mapped is mapped, it is determined whether a mapping relation corresponding to the character to be mapped exists in the mapping dictionary;
if yes, mapping the character to be mapped into a corresponding target mapping character according to the mapping relation;
If not, determining what character appears for the first time in the domain name to be mapped, selecting the character with the corresponding order from the ordered list as the target mapping character of the character to be mapped according to the character, mapping the character to be mapped into the target mapping character, and storing the mapping relation between the character to be mapped and the target mapping character into a mapping dictionary.
7. The method for updating a domain name recognition library according to any one of claims 1 to 6, wherein in S3, after obtaining the updated domain name library to be classified, a second recognition method is used to determine the type of each domain name in the domain name library to be classified;
adding the domain name with the determined type in the domain name library to be classified into the domain name identification library;
And (3) reserving the domain names of which the types are not identified in the domain name library to be classified for next comparison with the updated domain name identification library.
8. A domain name recognition library updating system, comprising:
the domain name acquisition unit is used for acquiring a batch of domain names to be identified;
the identifying unit is used for identifying each domain name to be identified, and comprises the following steps:
the mapping module is used for determining a mapping domain name to be identified corresponding to the domain name to be identified according to the determined ordered list and the mapping method;
The matching module is used for comparing the mapping domain name to be identified with the mapping domain name in the domain name identification library to determine whether the mapping domain name to be identified is matched with a mapping domain name in the domain name identification library, if not, the mapping domain name is communicated with the first classification module, and if so, the mapping domain name to be identified is communicated with the second classification module;
The first classification module is used for adding the domain name to be identified into a domain name library to be classified;
The second classification module is used for determining whether the types of all the domain names corresponding to the mapping domain names in the matching are consistent, if so, the second classification module is communicated with the third classification module, and if not, the second classification module is communicated with the fourth classification module;
The third classification module is used for adding the domain name to be identified into the domain name identification library and determining that the type of the domain name to be identified is the type of all domain names corresponding to the mapping domain name in the matching;
The fourth classification module is used for determining whether the duty ratio of legal domain names in all domain names corresponding to the mapping domain names in the matching reaches a first threshold value, and if not, communicating with the fifth classification module;
A fifth classification module, configured to add the domain name to be identified to a domain name library to be classified;
A sixth classification module, configured to add the domain name to be identified to the domain name identification library, determine that the type of the domain name to be identified is a legal domain name, and add an in-doubt flag to the domain name to be identified;
And when the ratio of the in-doubt domain name corresponding to one mapping domain name is determined to exceed a second threshold, transferring the domain name with the in-doubt mark from the domain name recognition library to the domain name library to be classified, and obtaining an updated domain name library to be classified and a domain name recognition library.
9. A domain name recognition method, characterized in that the domain name recognition method is used for performing domain name recognition of user access by using the domain name recognition library obtained by the domain name recognition library updating method according to any one of claims 1 to 7.
10. Server comprising a memory and a processor, said memory storing a program executable by said processor, characterized in that said program, when executed, implements a domain name identification library updating method according to any one of claims 1-7 and/or a domain name identification method according to claim 9.
CN202411747484.6A 2024-12-02 2024-12-02 Domain name identification library updating method, system, domain name identification method and server Pending CN119232498A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411747484.6A CN119232498A (en) 2024-12-02 2024-12-02 Domain name identification library updating method, system, domain name identification method and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411747484.6A CN119232498A (en) 2024-12-02 2024-12-02 Domain name identification library updating method, system, domain name identification method and server

Publications (1)

Publication Number Publication Date
CN119232498A true CN119232498A (en) 2024-12-31

Family

ID=94047756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411747484.6A Pending CN119232498A (en) 2024-12-02 2024-12-02 Domain name identification library updating method, system, domain name identification method and server

Country Status (1)

Country Link
CN (1) CN119232498A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090043720A1 (en) * 2007-08-10 2009-02-12 Microsoft Corporation Domain name statistical classification using character-based n-grams
CN111245784A (en) * 2019-12-30 2020-06-05 杭州安恒信息技术股份有限公司 Method for multi-dimensional detection of malicious domain name
CN112256838A (en) * 2020-11-06 2021-01-22 山东伏羲智库互联网研究院 Similar domain name searching method and device and electronic equipment
CN113542202A (en) * 2020-04-21 2021-10-22 深信服科技股份有限公司 A method, apparatus, device and computer-readable storage medium for identifying a domain name
CN117640576A (en) * 2023-11-30 2024-03-01 北京锐安科技有限公司 Domain name classification method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090043720A1 (en) * 2007-08-10 2009-02-12 Microsoft Corporation Domain name statistical classification using character-based n-grams
CN111245784A (en) * 2019-12-30 2020-06-05 杭州安恒信息技术股份有限公司 Method for multi-dimensional detection of malicious domain name
CN113542202A (en) * 2020-04-21 2021-10-22 深信服科技股份有限公司 A method, apparatus, device and computer-readable storage medium for identifying a domain name
CN112256838A (en) * 2020-11-06 2021-01-22 山东伏羲智库互联网研究院 Similar domain name searching method and device and electronic equipment
CN117640576A (en) * 2023-11-30 2024-03-01 北京锐安科技有限公司 Domain name classification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112347244B (en) Yellow-based and gambling-based website detection method based on mixed feature analysis
US9189746B2 (en) Machine-learning based classification of user accounts based on email addresses and other account information
CN112866023B (en) Network detection method, model training method, device, equipment and storage medium
CN109005145B (en) A malicious URL detection system and method based on automatic feature extraction
US9384389B1 (en) Detecting errors in recognized text
CN108092963B (en) Webpage identification method and device, computer equipment and storage medium
CN111343203B (en) Sample recognition model training method, malicious sample extraction method and device
CN102622553A (en) Method and device for detecting webpage safety
CN111753171B (en) Malicious website identification method and device
WO2007143914A1 (en) Method, device and inputting system for creating word frequency database based on web information
CN108566399A (en) Fishing website recognition methods and system
CN112333185B (en) Domain name shadow detection method and device based on DNS (Domain name Server) resolution
CN108768982B (en) Phishing website detection method and device, computing equipment and computer storage medium
CN112948725A (en) Phishing website URL detection method and system based on machine learning
CN111666928A (en) Computer file similarity recognition system and method based on image analysis
CN110855635A (en) URL (Uniform resource locator) identification method and device and data processing equipment
CN119232498A (en) Domain name identification library updating method, system, domain name identification method and server
CN117749496A (en) An email-based risk warning information generation method, device and medium
CN116192439A (en) Malicious website identification method, device, equipment and storage medium
CN112771524A (en) Camouflage detection based on fuzzy inclusion
CN111488622A (en) Method and device for detecting webpage tampering behavior and related components
CN115186240A (en) Social network user alignment method, device and medium based on relevance information
CN113361597A (en) URL detection model training method and device, electronic equipment and storage medium
CN110020366B (en) Mailbox information extraction method and device
CN107247708B (en) Surname identification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination