CN107800686A

CN107800686A - A kind of fishing website recognition methods and device

Info

Publication number: CN107800686A
Application number: CN201710873546.1A
Authority: CN
Inventors: 耿光刚; 延志伟; 张茜
Original assignee: China Internet Network Information Center
Current assignee: China Internet Network Information Center
Priority date: 2017-09-25
Filing date: 2017-09-25
Publication date: 2018-03-13
Anticipated expiration: 2037-09-25
Also published as: CN107800686B

Abstract

The present invention relates to a kind of fishing website recognition methods and device.This method includes：Detect the resource that other websites whether are embedded in website to be detected；If not being embedded in the resource of other websites, it is non-fishing website to judge website to be detected；Whether if having been inserted into the resource of other websites, judging the domain name of other websites with white list has common factor；If not occuring simultaneously, it is non-fishing website to judge website to be detected；If there is common factor, judge website to be detected for highly doubtful fishing website；Validity decision and domain name credit evaluation are carried out to highly doubtful fishing website, to determine whether website to be detected is fishing website.The present invention, which can make up black list techniques, can not filter the fishing website of the deficiency of emerging fishing website, efficient identification insertion brand website element and resource, the performance of lifting phishing filtering.

Description

Method and device for identifying phishing websites

技术领域technical field

本发明属于信息技术、网络安全技术领域，具体涉及一种钓鱼网站识别方法和装置。The invention belongs to the technical fields of information technology and network security, and in particular relates to a phishing website identification method and device.

背景技术Background technique

网络钓鱼(Phishing)这一术语产生于1996年，它是由钓鱼(Fishing)一词演变而来。在网络钓鱼的过程中，攻击者使用诱饵(比如电子邮件、手机短信)发送给大量用户，期待少数用户“上钩”，进而达到“钓鱼”(如窃取用户的隐私信息)的目的。国际反网络钓鱼工作组(APWG)给网络钓鱼的定义是：网络钓鱼是一种利用社会工程学和技术手段来窃取消费者的个人身份数据和财务账户凭证的网络攻击方式。采用社会工程手段的网络钓鱼攻击往往是向用户发送貌似来自合法的企业或机构的欺骗性电子邮件、手机短信等，引诱用户回复个人敏感信息或者点击里面的链接访问伪造的网站，进而泄露凭证信息(例如用户名、密码)或下载恶意软件。网络钓鱼严重威胁网民的财产和隐私安全，已成为当前互联网最大的安全隐患之一。The term Phishing was coined in 1996 and evolved from the word Fishing. In the process of phishing, attackers use bait (such as email, SMS) to send to a large number of users, expecting a small number of users to "take the bait", and then achieve the purpose of "phishing" (such as stealing users' private information). The International Anti-Phishing Working Group (APWG) defines phishing as: Phishing is a cyber attack method that uses social engineering and technical means to steal consumers' personally identifiable data and financial account credentials. Phishing attacks using social engineering methods often send users deceptive emails and text messages that appear to be from legitimate companies or institutions, luring users to reply to personal sensitive information or click on links inside to visit fake websites, thereby leaking credential information (e.g. username, password) or download malware. Phishing seriously threatens the property and privacy of netizens, and has become one of the biggest security risks on the Internet.

黑名单技术应用广泛，是主要的网络钓鱼过滤技术之一。比如Google Chrome、Mozilla Firefox和Apple Safai中使用的Google Safe API，就是根据Google提供的不断更新的黑名单，通过验证某一URL是否在黑名单中，来判断该URL是否是钓鱼网页或者恶意网页。黑名单技术简单易用，但存在明显的缺点：对于未包含在名单内的钓鱼网站无能为力，换句话说无法过滤新出现的钓鱼网站。Blacklist technology is widely used and is one of the main phishing filtering technologies. For example, the Google Safe API used in Google Chrome, Mozilla Firefox, and Apple Safari is based on the constantly updated blacklist provided by Google, and by verifying whether a certain URL is in the blacklist, it can be judged whether the URL is a phishing or malicious webpage. The blacklist technology is simple and easy to use, but it has obvious disadvantages: it cannot do anything about phishing websites that are not included in the list, in other words, it cannot filter newly emerging phishing websites.

发明内容Contents of the invention

本发明针对上述问题，提供一种钓鱼网站识别方法和装置，能够弥补黑名单技术无法过滤新出现的钓鱼网站的不足，高效识别嵌入品牌网站元素和资源的钓鱼网站，提升网络钓鱼过滤的性能。Aiming at the above problems, the present invention provides a method and device for identifying phishing websites, which can make up for the inability of blacklist technology to filter emerging phishing websites, efficiently identify phishing websites embedded with brand website elements and resources, and improve the performance of phishing filtering.

本发明通过分析PhishTank和中国反钓鱼网站联盟的钓鱼举报数据，发现绝大多数钓鱼网站为了仿冒地更逼真，往往直接使用品牌网站的资源(Logo、CSS等)；当用户通过浏览器访问这些钓鱼网站时，会随即发起对品牌网站域名的查询请求。本发明便是利用钓鱼网站的上述特性，通过分析域名系统(DNS)解析数据，识别这些钓鱼网站。By analyzing the phishing reporting data of PhishTank and China Anti-Phishing Website Alliance, the present invention finds that most phishing websites often directly use resources (Logo, CSS, etc.) of brand websites in order to counterfeit more realistically; When the website is displayed, a query request for the domain name of the brand website will be initiated immediately. The present invention utilizes the above characteristics of phishing websites to identify these phishing websites by analyzing domain name system (DNS) resolution data.

本发明采用的技术方案如下：The technical scheme that the present invention adopts is as follows:

一种钓鱼网站识别方法，包括以下步骤：A method for identifying a phishing website, comprising the following steps:

检测待检测网站中是否嵌入其他网站的资源；Detect whether resources of other websites are embedded in the website to be detected;

若待检测网站中没有嵌入其他网站的资源，则判定待检测网站为非钓鱼网站；If there is no resource embedded in other websites in the website to be detected, it is determined that the website to be detected is a non-phishing website;

若待检测网站中已嵌入其他网站的资源，则判断所述其他网站的域名是否与白名单有交集；若没有交集，则判定待检测网站为非钓鱼网站；若有交集，则判定待检测网站为高度疑似钓鱼网站；If resources of other websites have been embedded in the website to be detected, it is determined whether the domain names of the other websites overlap with the whitelist; if there is no intersection, it is determined that the website to be detected is not a phishing website; if there is an intersection, it is determined that the website to be detected It is a highly suspected phishing website;

对所述高度疑似钓鱼网站进行合法性判定和域名信用评估，以确定待检测网站是否为钓鱼网站。Perform legality determination and domain name credit evaluation on the highly suspected phishing website to determine whether the website to be detected is a phishing website.

进一步地，在检测待检测网站中是否嵌入其他网站的资源之前，判断待检测网站的域名是否在白名单中，如果在白名单中，则直接判定待检测网站为非钓鱼网站。Further, before detecting whether resources of other websites are embedded in the website to be detected, it is determined whether the domain name of the website to be detected is in the white list, and if it is in the white list, it is directly determined that the website to be detected is a non-phishing website.

进一步地，通过检测待检测网站的网页源码中是否嵌入其它网站的资源的链接，或者检测浏览器访问待检测网站过程中是否发起对其他域名的DNS查询请求，来判断待检测网站中是否嵌入其他网站的资源。Further, by detecting whether the web page source code of the website to be detected is embedded with links to resources of other websites, or detecting whether the browser initiates a DNS query request for other domain names during the process of accessing the website to be detected, it is determined whether the website to be detected is embedded with other resources. resources of the site.

进一步地，通过浏览器插件实时监听浏览器的网络行为，以捕获浏览器载入待检测网站的页面的过程中发起的网络资源查询请求，将所查询的域名与待检测网站的域名进行比较，从而判断是否发起对其他域名的DNS查询请求。Further, the network behavior of the browser is monitored in real time through the browser plug-in, to capture the network resource query request initiated by the browser during the process of loading the page of the website to be detected, and compare the queried domain name with the domain name of the website to be detected, In this way, it is determined whether to initiate a DNS query request for other domain names.

进一步地，通过搭建本地DNS递归服务器，并分析DNS查询请求日志，判断浏览器访问待检测网站过程中是否发起对其他域名的DNS查询请求。Further, by setting up a local DNS recursive server and analyzing the DNS query request log, it is judged whether the browser initiates a DNS query request for other domain names during the process of accessing the website to be detected.

进一步地，通过禁用计算机DNS客户端缓存，并将DNS客户端设置为仅使用搭建的本地DNS递归服务器进行DNS查询，以保证DNS查询请求日志完整记录浏览器载入页面时所发起的DNS查询请求。Further, by disabling the DNS client cache of the computer, and setting the DNS client to only use the built local DNS recursive server for DNS query, to ensure that the DNS query request log completely records the DNS query request initiated by the browser when loading the page .

进一步地，选择一个不存在的域名，将对该域名的DNS查询请求记录作为DNS查询请求日志中不同网页查询请求记录之间的分隔标识。Further, a non-existing domain name is selected, and the DNS query request record of the domain name is used as a separate identifier between different webpage query request records in the DNS query request log.

一种钓鱼网站识别装置，包括：A device for identifying a phishing website, comprising:

检测单元，用于检测待检测网站中是否嵌入其他网站的资源；A detection unit, configured to detect whether resources of other websites are embedded in the website to be detected;

第一判定单元，用于在待检测网站中没有嵌入其他网站的资源时，判定待检测网站为非钓鱼网站；The first determining unit is configured to determine that the website to be detected is a non-phishing website when no resources of other websites are embedded in the website to be detected;

白名单比较单元，用于判断待检测网站中嵌入的其他网站的域名是否与白名单有交集；A whitelist comparison unit, configured to determine whether the domain names of other websites embedded in the website to be detected overlap with the whitelist;

第二判定单元，用于在所述其他网站的域名与白名单没有交集时，判定待检测网站为非钓鱼网站；以及在所述其他网站的域名与白名单有交集时，判定待检测网站为高度疑似钓鱼网站；The second determination unit is used to determine that the website to be detected is a non-phishing website when the domain names of the other websites do not overlap with the white list; and when the domain names of the other websites overlap with the white list, determine that the website to be detected is a non-phishing website Highly suspected phishing website;

评估单元，用于对所述高度疑似钓鱼网站进行合法性判定和域名信用评估；An evaluation unit, configured to determine the legitimacy and domain name credit evaluation of the highly suspected phishing website;

第三判定单元，用于根据所述评估单元得到的结果，判定待检测网站是否为钓鱼网站。The third determination unit is configured to determine whether the website to be detected is a phishing website according to the result obtained by the evaluation unit.

进一步地，所述检测单元通过检测待检测网站的网页源码中是否嵌入其它网站的资源的链接，来判断待检测网站中是否嵌入其他网站的资源；或者，所述检测单元为一浏览器插件，通过实时监听浏览器的网络行为，捕获浏览器载入待检测网站的页面的过程中发起的网络资源查询请求，并将所查询的域名与待检测网站的域名进行比较，以判断是否发起对其他域名的DNS查询请求，从而判断待检测网站中是否嵌入其他网站的资源。Further, the detection unit judges whether resources of other websites are embedded in the website to be detected by detecting whether the source code of the webpage of the website to be detected is embedded with resources of other websites; or, the detection unit is a browser plug-in, By monitoring the browser's network behavior in real time, capture the network resource query request initiated by the browser during the process of loading the page of the website to be detected, and compare the queried domain name with the domain name of the website to be detected to determine whether to initiate a search for other DNS query request of the domain name, so as to determine whether resources of other websites are embedded in the website to be detected.

进一步地，所述检测单元为本地DNS递归服务器，其通过分析DNS查询请求日志判断浏览器访问待检测网站过程中是否发起对其他域名的DNS查询请求，从而判断待检测网站中是否嵌入其他网站的资源。Further, the detection unit is a local DNS recursive server, which analyzes the DNS query request log to determine whether the browser initiates a DNS query request for other domain names during the process of accessing the website to be detected, thereby judging whether the website to be detected is embedded in other websites. resource.

与现有技术相比，本发明的有益效果如下：Compared with the prior art, the beneficial effects of the present invention are as follows:

1.便于通过浏览器插件的形式实现，从而实现在线实时识别并可将结果及时反馈，给用户以提醒，避免上当受骗。1. It is easy to implement in the form of a browser plug-in, so as to realize online real-time identification and timely feedback of the results to remind users to avoid being deceived.

2.可以与黑名单技术一起使用，互为补充。可在使用本发明进行钓鱼识别之前，将待检测URL的域名与黑名单进行匹配，若黑名单中存在该域名，则可以认定该URL为钓鱼，不必进行进一步的识别，从而有效提高识别的效率。另一方面，若未与黑名单匹配成功，且在利用本发明进行识别后认定其为钓鱼，可将其对应的域名加入黑名单，实现对黑名单的扩展。2. It can be used together with the blacklist technology to complement each other. Before using the present invention for phishing identification, the domain name of the URL to be detected can be matched with the blacklist, if the domain name exists in the blacklist, the URL can be identified as phishing, and no further identification is necessary, thereby effectively improving the efficiency of identification . On the other hand, if the matching with the blacklist is not successful, and it is identified as phishing after the identification by the present invention, the corresponding domain name can be added to the blacklist to realize the expansion of the blacklist.

3.方便扩展。针对新品牌的钓鱼，只要把品牌资源所在域名添加至白名单即可。本发明的关键是维护一个具有完整性和有效性的白名单，与黑名单相比，由合法品牌域名构成的白名单相对来说更稳定，维护和更新也更容易。3. Easy to expand. For new brand phishing, just add the domain name where the brand resource is located to the white list. The key of the present invention is to maintain a white list with completeness and validity. Compared with the black list, the white list composed of legal brand domain names is relatively more stable and easier to maintain and update.

4.语言无关。本发明所有步骤均不涉及钓鱼网站的语言类型，可对全球品牌仿冒进行识别。因此，本发明不受网站语言类型的约束，与其他钓鱼识别方法相比，应用范围更为广泛。4. Language is irrelevant. All the steps of the invention do not involve the language types of phishing websites, and can identify counterfeiting of global brands. Therefore, the present invention is not restricted by the language type of the website, and compared with other phishing identification methods, the application range is wider.

附图说明Description of drawings

图1是钓鱼网站示意图。Figure 1 is a schematic diagram of a phishing website.

图2是图1所示钓鱼网站的源码片段截图。Figure 2 is a screenshot of the source code fragment of the phishing website shown in Figure 1.

图3是实施例中钓鱼网站识别方法的流程图。Fig. 3 is a flowchart of a method for identifying a phishing website in an embodiment.

图4是实施例中钓鱼网站识别装置的组成单元示意图。Fig. 4 is a schematic diagram of components of the device for identifying phishing websites in the embodiment.

具体实施方式Detailed ways

下面通过具体实施例和附图，对本发明做进一步详细说明。The present invention will be described in further detail below through specific embodiments and accompanying drawings.

网络钓鱼本质上是品牌仿冒，钓鱼者通过邮件、即时通讯等方式发送虚假信息，引诱用户访问事先搭建的仿冒网站，以骗取用户的隐私和财产。其中仿冒网站作为最重要的犯罪场所，往往与真实品牌网站在视觉上高度相似，以欺骗用户信以为真。时至今日，网站(特别是大品牌网站)已经不是简单的文字和图片，而是包含大量独特品牌风格的元素和资源，包括Logo图片、Favicon图片、CSS文件、JS文件等；钓鱼仿冒网站为了以假乱真，往往直接使用品牌网站的这些资源，即网页源码中嵌入这些资源的链接。例如：https://wvw.paypal-limited.com-webapps-security.com是钓鱼PayPal(http://www.paypal.com)的网站，其效果如图1所示。Phishing is essentially brand counterfeiting. Phishers send false information through emails, instant messaging, etc. to lure users to visit pre-built counterfeit websites to defraud users of their privacy and property. Among them, counterfeit websites, as the most important crime site, are often highly similar to real brand websites visually, in order to deceive users into believing they are real. Today, websites (especially big brand websites) are no longer simple text and pictures, but contain a large number of elements and resources with unique brand styles, including Logo pictures, Favicon pictures, CSS files, JS files, etc.; To confuse the real ones, these resources of the brand website are often directly used, that is, the links to these resources are embedded in the source code of the web page. For example: https://wvw.paypal-limited.com-webapps-security.com is a phishing PayPal (http://www.paypal.com) website, and its effect is shown in Figure 1.

该登陆页面与paypal官网的登陆页面几乎一模一样，该网站源码片段截图如图2所示。从该截图可以看出，该钓鱼网站使用了paypal的Favicon图片、CSS文件和JS文件(注：PayPal的资源均放置在www.paypalobjects.com)。如此一来，当用户通过浏览器访问https://wvw.paypal-limited.com-webapps-security.com/时，浏览器首先会发起对域名“com-webapps-security.com”的查询请求，随即会发起对域名“paypalobjects.com”的查询请求。本发明方法就是通过充分挖掘钓鱼网站的这一特性，以高效识别钓鱼网站。The landing page is almost exactly the same as the landing page of the paypal official website. The screenshot of the source code fragment of the website is shown in Figure 2. It can be seen from the screenshot that the phishing website uses paypal's Favicon image, CSS file and JS file (Note: PayPal's resources are all placed at www.paypalobjects.com). In this way, when a user accesses https://wvw.paypal-limited.com-webapps-security.com/ through a browser, the browser will first initiate a query request for the domain name "com-webapps-security.com", A query request for the domain name "paypalobjects.com" is then initiated. The method of the invention is to efficiently identify the phishing website by fully mining this characteristic of the phishing website.

本发明的钓鱼网站识别方法的流程如图3所示。针对用户输入的每一个网址，执行下述过程：The flow chart of the phishing website identification method of the present invention is shown in FIG. 3 . For each URL entered by the user, perform the following process:

一、根据已有的白名单库，判断待检测URL对应的域名是否在白名单当中，如果在白名单中，说明该URL非钓鱼网站，结束识别流程；否则，执行第二步。1. According to the existing whitelist database, determine whether the domain name corresponding to the URL to be detected is in the whitelist. If it is in the whitelist, it means that the URL is not a phishing website, and the identification process is ended; otherwise, perform the second step.

二、使用浏览器发起对该Domain的查询请求，访问该Domain所在的服务器，载入页面，并判断在该过程中是否发起对其他域名(newNomains)的查询请求，若没有，则认为该URL非钓鱼网站，结束流程；反之，则进行下一步的识别。2. Use a browser to initiate a query request for the Domain, visit the server where the Domain is located, load the page, and determine whether to initiate a query request for other domain names (newNomains) during the process. If not, consider that the URL is not Phishing website, end the process; otherwise, proceed to the next step of identification.

三、判断newDomains中是否有域名在白名单中(即是否与白名单有交集)，若没有，则认为该URL非钓鱼网站；反之则认为该URL为高度疑似钓鱼网站，进行进一步的判别。3. Determine whether there is a domain name in newDomains that is in the whitelist (that is, whether it intersects with the whitelist). If not, the URL is considered not to be a phishing website; otherwise, the URL is considered to be a highly suspected phishing website for further identification.

四、对高度疑似钓鱼网站，进一步进行合法性判定和域名信用评估，最终确定该网站是否为钓鱼。合法性判定是判定该疑似钓鱼网站使用的白名单中相应品牌的域名是否合法，域名信用评估则是给域名进行打分，判断该网站域名是否可信。4. For highly suspected phishing websites, further conduct legality determination and domain name credit evaluation, and finally determine whether the website is phishing. The legality judgment is to judge whether the domain name of the corresponding brand in the white list used by the suspected phishing website is legal, and the domain name credit evaluation is to score the domain name to judge whether the website domain name is credible.

其中最后一步，对于高度疑似钓鱼网站，可以进一步分析该Domain是否在搜索引擎被索引，如果搜索引擎有索引，则非钓鱼；以及与白名单内所匹配的域名(whiteDomain)是否同一人注册，如果是则非钓鱼；以及Domain和whiteDomain的解析IP是否在一个AS(Autonomous System)域，如果是则非钓鱼；不满足上述情况，则认定为钓鱼。In the last step, for highly suspected phishing websites, you can further analyze whether the Domain is indexed in the search engine, if the search engine has an index, it is not phishing; and whether the domain name (whiteDomain) matching in the whitelist is registered by the same person, if If yes, it is not phishing; and whether the resolved IP of Domain and whiteDomain is in an AS (Autonomous System) domain, if yes, it is not phishing; if the above conditions are not met, it is considered as phishing.

本发明的重点在于确认待检测的URL的网页源码中是否嵌入品牌网站的元素和资源的链接，即在浏览器访问该URL时是否发起对其他域名(newDomains)的查询请求。本发明不限定具体的实现方式，可以通过页面内容分析、浏览器查询监听、递归DNS解析分析等多种方式实现，以下将分别给出实施例。The key point of the present invention is to confirm whether to embed elements and resource links of the brand website in the web page source code of the URL to be detected, that is, whether to initiate a query request to other domain names (newDomains) when the browser visits the URL. The present invention does not limit the specific implementation manner, and can be realized through various methods such as page content analysis, browser query monitoring, recursive DNS analysis and analysis, and the following will give examples respectively.

1.通过分析网页源码1. By analyzing the source code of the web page

钓鱼仿冒网站中使用品牌网站资源最直接的体现就是在网页源码中嵌入这些资源的链接。在网页源码中，一般通过“href”和“src”这两个属性实现Logo图片、Favicon图片、CSS文件、JS文件等资源的调用。The most direct manifestation of using brand website resources in phishing and counterfeiting websites is to embed links to these resources in the source code of web pages. In the source code of a web page, resources such as Logo pictures, Favicon pictures, CSS files, and JS files are generally called through the two attributes of "href" and "src".

因此本发明通过抓取待检测URL的网页源码，对源码进行分析，使用正则表达式提取源码中调用Logo、Favicon、CSS、JS等资源的代码段中“href”、“src”这两个属性的值，这些值即为调用相应资源的链接，进而得到链接所对应的域名。随后，将源码中调用资源的链接的域名与该待检测URL的域名进行比较，若存在与待检测URL的域名不同的情况，则认为该URL中嵌入了其他品牌网站的资源，即断定存在品牌仿冒可能。Therefore, the present invention analyzes the source code by grabbing the webpage source code of the URL to be detected, and uses regular expressions to extract the two attributes of "href" and "src" in the code segment that calls resources such as Logo, Favicon, CSS, and JS in the source code. These values are the links to call the corresponding resources, and then the domain names corresponding to the links are obtained. Subsequently, compare the domain name of the link calling the resource in the source code with the domain name of the URL to be detected. If there is a situation different from the domain name of the URL to be detected, it is considered that resources of other brand websites are embedded in the URL, that is, it is determined that there is a brand Counterfeiting possible.

2.浏览器插件的形式(捕获DNS查询请求)2. In the form of browser plug-ins (capturing DNS query requests)

浏览器在载入一个网页页面时，对于JS、CSS、Image等资源需要向服务器端请求下载，该过程中将产生DNS查询、发送请求、重定向等一系列动作。参照Chrome DevTools，可开发一个浏览器插件，实时监听浏览器的网络行为，以捕获浏览器载入待检测URL页面过程中发起的网络资源查询请求，并筛选出对JS、CSS、Images这三个类别的查询请求，将所查询的域名与待检测URL的域名进行比较，判断是否发起newDomains的查询请求，即判定是否存在钓鱼可能。When the browser loads a web page, it needs to request the server to download resources such as JS, CSS, and Image. During this process, a series of actions such as DNS query, sending request, and redirection will be generated. With reference to Chrome DevTools, a browser plug-in can be developed to monitor the browser's network behavior in real time to capture the network resource query requests initiated by the browser during the process of loading the URL page to be detected, and filter out the JS, CSS, and Images. For category query requests, compare the queried domain name with the domain name of the URL to be detected, and determine whether to initiate a newDomains query request, that is, determine whether there is a possibility of phishing.

3.搭建本地DNS递归服务器，分析DNS查询请求日志。3. Build a local DNS recursive server and analyze DNS query request logs.

搭建本地DNS递归服务器，并进行相应的配置使其能记录接收到的DNS查询请求。为保证DNS查询请求日志完整记录浏览器载入页面时所发起的DNS查询请求，禁用计算机DNS客户端缓存，并将DNS客户端设置为仅使用搭建的本地DNS递归服务器进行DNS查询。Build a local DNS recursive server and configure it accordingly so that it can record received DNS query requests. In order to ensure that the DNS query request log completely records the DNS query request initiated by the browser when loading the page, disable the DNS client cache of the computer, and set the DNS client to only use the built local DNS recursive server for DNS query.

在DNS查询请求日志中，往往只记录查询时间、用户IP、查询域名三个字段的信息，无法区分浏览器载入一个网页时所发起的DNS查询请求的记录范围。为此，本发明事先选择一个不存在的域名，将对该域名的DNS查询请求记录作为日志中不同网页查询请求记录之间的分隔标识。在每次访问一个待检测URL的前后，均对该选定的域名进行访问，以确保在对DNS查询请求日志进行分析时，能准确、完整地得到待检测网页在载入过程中所发起的DNS查询请求记录。In the DNS query request log, information in three fields, query time, user IP, and query domain name, is often recorded, and it is impossible to distinguish the record scope of the DNS query request initiated by the browser when loading a web page. For this reason, the present invention selects a non-existing domain name in advance, and uses the DNS query request record of the domain name as a separate identifier between different web page query request records in the log. Before and after each visit to a URL to be detected, the selected domain name is accessed to ensure that when analyzing the DNS query request log, the URL initiated by the webpage to be detected during the loading process can be accurately and completely obtained DNS query request record.

使用正则表达式对DNS查询请求日志进行匹配，得到待检测URL页面发起的DNS查询请求记录，其中第一行记录为该URL的域名查询请求记录，其余皆为该页面调用包含但不限于Logo图片、Favicon图片、CSS文件、JS文件等资源时所发起的DNS查询记录，进一步比较这些关联查询的域名是否在白名单中，以判定是否存在仿冒可能。Use regular expressions to match the DNS query request log to get the DNS query request record initiated by the URL page to be detected. The first line of the record is the domain name query request record of the URL, and the rest are the page calls including but not limited to Logo pictures , Favicon pictures, CSS files, JS files and other resources, and further compare whether the domain names of these associated queries are in the white list to determine whether there is a possibility of counterfeiting.

本发明的另一实施例提供一种钓鱼网站识别装置，如图4所示，包括：Another embodiment of the present invention provides a phishing website identification device, as shown in Figure 4, comprising:

所述检测单元通过检测待检测网站的网页源码中是否嵌入其它网站的资源的链接，来判断待检测网站中是否嵌入其他网站的资源；或者，所述检测单元为一浏览器插件，通过实时监听浏览器的网络行为，捕获浏览器载入待检测网站的页面的过程中发起的网络资源查询请求，并将所查询的域名与待检测网站的域名进行比较，以判断是否发起对其他域名的DNS查询请求，从而判断待检测网站中是否嵌入其他网站的资源。The detection unit judges whether the resources of other websites are embedded in the website to be detected by detecting whether the web page source code of the website to be detected is embedded with the resource links of other websites; or, the detection unit is a browser plug-in, and monitors The browser's network behavior captures the network resource query request initiated by the browser during the process of loading the page of the website to be detected, and compares the queried domain name with the domain name of the website to be detected to determine whether to initiate DNS for other domain names Query requests, so as to determine whether resources of other websites are embedded in the website to be detected.

所述检测单元也可以为搭建的本地DNS递归服务器，其通过分析DNS查询请求日志判断浏览器访问待检测网站过程中是否发起对其他域名的DNS查询请求，从而判断待检测网站中是否嵌入其他网站的资源。The detection unit can also be a local DNS recursive server set up, which analyzes the DNS query request log to determine whether the browser initiates a DNS query request to other domain names in the process of accessing the website to be detected, thereby judging whether other websites are embedded in the website to be detected Resources.

以上实施例仅用以说明本发明的技术方案而非对其进行限制，本领域的普通技术人员可以对本发明的技术方案进行修改或者等同替换，而不脱离本发明的精神和范围，本发明的保护范围应以权利要求书所述为准。The above embodiments are only used to illustrate the technical solution of the present invention and not to limit it. Those of ordinary skill in the art can modify or equivalently replace the technical solution of the present invention without departing from the spirit and scope of the present invention. The scope of protection should be determined by the claims.

Claims

1. a kind of fishing website recognition methods, it is characterised in that comprise the following steps：

Detect the resource that other websites whether are embedded in website to be detected；

If not being embedded in the resource of other websites in website to be detected, it is non-fishing website to judge website to be detected；

Whether if having been inserted into the resource of other websites in website to be detected, judge the domain name of other websites has with white list Occur simultaneously；If not occuring simultaneously, it is non-fishing website to judge website to be detected；If there is common factor, judge website to be detected for height Doubtful fishing website；

Validity decision and domain name credit evaluation are carried out to the highly doubtful fishing website, with determine website to be detected whether be Fishing website.

2. the method as described in claim 1, it is characterised in that the money of other websites whether is embedded in website to be detected is detected Before source, judge that the domain name of website to be detected whether in white list, if in white list, directly judges website to be detected For non-fishing website.

3. method as claimed in claim 1 or 2, it is characterised in that in the webpage source code by detecting website to be detected whether Whether initiated to other domain names during the link of the resource of embedded other websites, or detection browser access website to be detected DNS query request, to judge the resource of other websites whether is embedded in website to be detected.

4. method as claimed in claim 3, it is characterised in that by detect webpage source code judge it is whether embedding in website to be detected Entering the method for the resource of other websites is：The webpage source code of website to be detected is captured, is extracted in source code and adjusted using regular expression With the value of the two attributes of href, src in the code segment of resource, the link of respective resources is as called, and then obtains linking institute Corresponding domain name；Then the domain name corresponding to the link of resource will be called in source code compared with the domain name of website to be detected, If in the presence of the domain name different from the domain name of website to be detected, then it is assumed that embedded in the resource of other websites in website to be detected.

5. method as claimed in claim 3, it is characterised in that monitor the network row of browser in real time by browser plug-in For to capture the Internet resources inquiry request initiated during browser is loaded into the page of website to be detected, by what is inquired about Domain name is compared with the domain name of website to be detected, so as to judge whether to initiate to ask the DNS query of other domain names.

6. method as claimed in claim 3, it is characterised in that by building local dns recursion server, and analyze DNS and look into Request Log is ask, judges whether initiate to ask the DNS query of other domain names during browser access website to be detected.

7. method as claimed in claim 6, it is characterised in that cached by disabling computer DNS client, and DNS is objective Family end is arranged to that the local dns recursion server progress DNS query built is used only, to ensure that DNS query Request Log is complete Record browser is loaded into the DNS query request initiated during the page.

8. method as claimed in claim 7, it is characterised in that one domain name being not present of selection, the DNS of the domain name will be looked into Request record is ask as the separation mark between different web pages inquiry request record in DNS query Request Log.

A kind of 9. fishing website identification device, it is characterised in that including：

Detection unit, the resource of other websites whether is embedded in website to be detected for detecting；

First identifying unit, during resource for not being embedded in other websites in website to be detected, judge that website to be detected is Non- fishing website；

White list comparing unit, for judging in website to be detected whether the domain name of other embedded websites with white list has friendship Collection；

Second identifying unit, for when the domain name of other websites is not occured simultaneously with white list, judging that website to be detected is Non- fishing website；And domain name and the white list in other websites be when having common factor, judge website to be detected to be highly doubtful Fishing website；

Assessment unit, for carrying out validity decision and domain name credit evaluation to the highly doubtful fishing website；

3rd identifying unit, for the result obtained according to the assessment unit, judge whether website to be detected is fishing website.

10. device as claimed in claim 9, it is characterised in that the detection unit is by detecting the webpage of website to be detected Whether the link of the resource of other websites is embedded in source code, to judge the resource of other websites whether is embedded in website to be detected； Or the detection unit is a browser plug-in, by monitoring the network behavior of browser in real time, capture browser, which is loaded into, to be treated Detect the Internet resources inquiry request initiated during the page of website, and by the domain name inquired about and the domain of website to be detected Name is compared, to judge whether to initiate to ask the DNS query of other domain names, so as to judge whether be embedded in website to be detected The resource of other websites.

11. device as claimed in claim 9, it is characterised in that the detection unit is local dns recursion server, and it is logical Cross analysis DNS query Request Log and judge whether initiate to look into the DNS of other domain names during browser access website to be detected Request is ask, so as to judge the resource of other websites whether is embedded in website to be detected.