CN104363251B

CN104363251B - Website security detection method and device

Info

Publication number: CN104363251B
Application number: CN201410769106.8A
Authority: CN
Inventors: 龙专
Original assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Current assignee: Secworld Information Technology Beijing Co Ltd; Qax Technology Group Inc
Priority date: 2014-12-12
Filing date: 2014-12-12
Publication date: 2016-09-28
Anticipated expiration: 2034-12-12
Also published as: CN104363251A

Abstract

The present invention relates to a website security detection method, comprising the following steps: receiving collected data including a hypertext transfer protocol request packet through a remote port; using the link contained in the request packet to determine a new associated link belonging to a known specific website; The webpage corresponding to the new link implements vulnerability scanning and detection. Correspondingly, the present invention also provides a website security detection device. The present invention can discover known specific websites and their new links in time, and can implement loophole detection on these new links in real time, avoid missing detection, and can avoid unnecessary detection of invalid links and repeated links, and has the advantages of efficient and timely maintenance of website security The advantages.

Description

Website security detection method and device

技术领域technical field

本发明涉及互联网安全技术，尤其涉及一种网站安全检测方案与装置。The invention relates to Internet security technology, in particular to a website security detection scheme and device.

背景技术Background technique

网站访问存在着各种各样的安全隐患，比如：COOKIE中毒、应用程序缓冲溢出、跨站脚本攻击、已知安全漏洞等等，这些网站安全问题会进一步导致用户数据的安全问题。因此，网站访问者希望了解网站的安全程度，自然地倾向于使用较为安全的网站，而网站管理者更希望能够及时修复漏洞，克服其网站的安全问题，为网站访问者提供更加安全的浏览平台。There are various security risks in website access, such as: COOKIE poisoning, application buffer overflow, cross-site scripting attack, known security holes, etc. These website security issues will further lead to security issues of user data. Therefore, website visitors want to know the security level of the website, and naturally tend to use relatively safe websites, while website managers hope to fix the loopholes in time, overcome the security problems of their websites, and provide website visitors with a safer browsing platform .

网站安全检测的方法，通常是借助扫描器通过爬虫技术去主动抓取网页，并针对所抓取的网页进行安全性测试。为了避免实施爬虫技术造成增加网站服务器的负荷，通常，网站的安全性测试通过定时或者用户手动触发的方式去执行网页抓取。然而，在信息高度发达的今天，作为信息载体的网站业务(代码)更新频繁，而且，每个公司配备的信息安全人员不足以支持如此多和频繁的安全测试。这就导致一对矛盾，即扫描器频繁扫描所导致的服务器压力增大、人力资源不足，以及扫描器间隔扫描所导致的新网页安全性检测不及时两者之间的矛盾。具体而言，涉及网站网页安全性测试的现有技术所造成的纰漏包括但不限于出现如下问题：The method of website security detection usually uses a scanner to actively crawl webpages through crawler technology, and conducts security tests on the crawled webpages. In order to avoid increasing the load on the website server caused by the implementation of the crawler technology, usually, the security test of the website performs webpage crawling by timing or manually triggered by the user. However, in today's highly developed information world, the website business (code) as an information carrier is updated frequently, and the information security personnel equipped by each company are not enough to support so many and frequent security tests. This leads to a pair of contradictions, that is, the contradiction between the increased pressure on the server caused by the frequent scanning of the scanner, the shortage of human resources, and the untimely security detection of new web pages caused by the interval scanning of the scanner. Specifically, the flaws caused by existing technologies related to website web page security testing include but are not limited to the following problems:

例如，孤岛页面是爬虫抓不到的页面，如果存在漏洞又被黑客发现了的话，会导致极大的安全风险。现有的漏洞扫描器都是基于蜘蛛技术来抓取网站链接后再进行安全测试的，不能及时扫描新上线的域名和不能检测到孤岛页面存在的漏洞。For example, an island page is a page that crawlers cannot grasp. If there are vulnerabilities and are discovered by hackers, it will lead to great security risks. Existing vulnerability scanners are all based on spider technology to grab website links and then perform security testing. They cannot scan newly launched domain names in time and cannot detect vulnerabilities in isolated island pages.

再如，大型网站(如新闻类、电商类等)每天都会有大量的新网页上线，定时扫描并不能及时对新上线的网页进行安全性测试。比如网站管理员设定每天0点对网站进行检测，则1点上线的网页要过23个小时之后才能进行检测。如果这些新上线的网页存在漏洞的话，在这段时间内将使网站陷于不安全的境地。For another example, large-scale websites (such as news, e-commerce, etc.) have a large number of new webpages going online every day, and regular scanning cannot perform security tests on the newly launched webpages in time. For example, if the website administrator sets the website to be tested at 0:00 every day, the webpage that goes online at 1:00 will be tested after 23 hours. If there are vulnerabilities in these newly launched web pages, it will make the website in an unsafe situation during this time.

发明内容Contents of the invention

本发明的目的在于克服上述问题的一个或多个方面，而提供一种网站安全检测方法及装置。The purpose of the present invention is to overcome one or more aspects of the above-mentioned problems, and provide a website security detection method and device.

为实现本发明的目的，本发明采取如下技术方案：For realizing the purpose of the present invention, the present invention takes following technical scheme:

本发明提供的一种网站安全检测方法，包括以下步骤：A method for detecting website security provided by the present invention comprises the following steps:

通过远程端口接收包含超文本传输协议请求包的采集数据；Receive the collected data including the hypertext transfer protocol request packet through the remote port;

利用所述请求包所包含的链接确定属于已知特定网站的关联新链接；Using the links contained in the request packet to determine associated new links belonging to known specific websites;

对所述新链接相对应的网页实施漏洞扫描检测。Implement vulnerability scanning and detection on the webpage corresponding to the new link.

根据本发明的一个实施例所揭示，所述采集数据的来源IP地址为该请求包的目的IP地址。较佳的，所述采集数据来源于一个安装于所述来源IP地址的设备的采集模块。According to an embodiment of the present invention, the source IP address of the collected data is the destination IP address of the request packet. Preferably, the collected data comes from a collection module installed on the device at the source IP address.

根据本发明的另一实施例所揭示，所述采集数据的来源IP地址为该请求包中的来源IP地址。较佳的，所述采集数据来源于一个安装于所述来源IP地址的设备的浏览器插件。较佳的，确定属于已知特定网站的关联新链接之前，汇总所述请求包所包含的链接并去除其中的重复链接。According to another embodiment of the present invention, the source IP address of the collected data is the source IP address in the request packet. Preferably, the collected data comes from a browser plug-in installed on the device at the source IP address. Preferably, before determining the associated new link belonging to the known specific website, the links contained in the request package are summarized and duplicate links are removed.

根据本发明的一个实施例所揭示，所述去除重复链接的步骤包括如下细分步骤：According to one embodiment of the present invention, the step of removing duplicate links includes the following subdivision steps:

将访问数据库而形成的仅其变量不同的多个链接确定为重复链接；Multiple links formed by accessing the database that differ only in their variables are identified as duplicate links;

仅保留重复链接其中之一实现去除重复链接。Only keep one of the duplicate links to remove duplicate links.

根据本发明另一实施例所揭示，所述去除重复链接的步骤包括如下细分步骤：According to another embodiment of the present invention, the step of removing duplicate links includes the following subdivision steps:

将具有相同签名的多个链接确定为重复链接；Identify multiple links with the same signature as duplicate links;

根据本发明实施例之一所揭示，所述已知特定网站和/或其新链接通过图形用户界面接收用户设定而预先给定。较佳的，所述图形用户界面所接收的设定的内容包括指向网站的域名或IP地址。According to the disclosure of one of the embodiments of the present invention, the known specific website and/or its new link are preset by receiving user settings through a graphical user interface. Preferably, the set content received by the graphical user interface includes a domain name or an IP address pointing to a website.

根据本发明实施例之一所揭示，通过确定请求包中的链接所指向的IP地址属于所述已知特定网站所指向的IP地址或其所属IP地址段而将该链接确定为属于已知特定网站的关联新链接。According to the disclosure of one of the embodiments of the present invention, by determining that the IP address pointed to by the link in the request packet belongs to the IP address pointed to by the known specific website or its IP address segment, the link is determined to belong to a known specific website. The associated new link for the site.

根据本发明实施例之一所揭示，通过比较所述请求包中的链接的域名的注册特征信息与已知特定网站的域名的注册特征信息相同而将该链接确定为属于已知特定网站的关联新链接。According to the disclosure of one of the embodiments of the present invention, by comparing the registration characteristic information of the domain name of the link in the request packet with the registration characteristic information of the domain name of the known specific website, the link is determined to belong to the association of the known specific website new link.

较佳的，设有已知特定网站列表用于记录一个或多个所述的已知特定网站的域名和/或其相应的IP地址。Preferably, there is a list of known specific websites for recording the domain names and/or their corresponding IP addresses of one or more known specific websites.

进一步，所述利用所述请求包所包含的链接确定属于已知特定网站的关联新链接的步骤，包括如下细分步骤：Further, the step of using the links contained in the request packet to determine the associated new link belonging to a known specific website includes the following subdivision steps:

提取已获取的所有请求包的链接；Extract links to all request packages that have been fetched;

去除所提取的链接中指向具有相同代码的网页的重复链接；Remove duplicate links pointing to web pages with the same code from the extracted links;

确定其中的新链接，将该新链接添加至待扫描队列。Identify new links among them and add the new links to the queue to be scanned.

根据本发明一个实施例所揭示，所述对所述新链接所指向的网页实施漏洞扫描的步骤，包括如下细分步骤：According to an embodiment of the present invention, the step of performing vulnerability scanning on the webpage pointed to by the new link includes the following subdivision steps:

从用于记载所述新链接的待扫描队列中获取所述新链接；acquiring the new link from a queue to be scanned for recording the new link;

对所述新链接直接映射的网页实施漏洞扫描检测。Vulnerability scanning is performed on the webpage directly mapped to the new link.

根据本发明另一实施例所揭示，所述对所述新链接相对应的网页实施漏洞扫描的步骤，包括如下细分步骤：According to another embodiment of the present invention, the step of performing vulnerability scanning on the webpage corresponding to the new link includes the following subdivision steps:

获取所述待扫描队列中的新链接映射的网页并添加至本地网页库；Obtain the webpage mapped by the new link in the queue to be scanned and add it to the local webpage library;

对依据新链接下载的网页库中的网页实施漏洞扫描检测。Implement vulnerability scanning and detection on the webpages in the webpage library downloaded according to the new link.

进一步，该方法包括后续步骤：显示图形用户界面以输出实施漏洞扫描检测的结果信息。Further, the method includes a subsequent step: displaying a graphical user interface to output result information of vulnerability scanning and detection.

本发明提供的一种网站安全检测装置，包括：A website security detection device provided by the present invention includes:

抓包单元，用于通过远程端口接收包含超文本传输协议请求包的采集数据；A packet capture unit is used to receive the collection data comprising a hypertext transfer protocol request packet through a remote port;

查新单元，适于利用所述请求包所包含的链接确定属于已知特定网站的关联新链接；A novelty checking unit, adapted to use the links contained in the request packet to determine associated new links belonging to known specific websites;

检测单元，用于对所述新链接相对应的网页实施漏洞扫描检测。The detection unit is configured to perform vulnerability scanning detection on the webpage corresponding to the new link.

根据本发明的一个实施例所揭示，所述抓包单元接收的所述采集数据的来源IP地址为该请求包的目的IP地址。较佳的，所述采集数据来源于一个安装于所述来源IP地址的设备的采集模块。According to an embodiment of the present invention, the source IP address of the collected data received by the packet capture unit is the destination IP address of the request packet. Preferably, the collected data comes from a collection module installed on the device at the source IP address.

根据本发明的另一实施例所揭示，所述的抓包单元接收的所述采集数据的来源IP地址为该请求包中的来源IP地址。较佳的，所述采集数据来源于一个安装于所述来源IP地址的设备的浏览器插件。According to another embodiment of the present invention, the source IP address of the collected data received by the packet capture unit is the source IP address in the request packet. Preferably, the collected data comes from a browser plug-in installed on the device at the source IP address.

较佳的，所述查新单元，被配置为在确定属于已知特定网站的关联新链接之前，汇总所述请求包所包含的链接并去除其中的重复链接。Preferably, the novelty checking unit is configured to summarize the links contained in the request package and remove duplicate links therein before determining the associated new link belonging to a known specific website.

根据本发明的一个实施例所揭示，所述查新单元包括：According to an embodiment of the present invention, the novelty checking unit includes:

查重子模块，用于将访问数据库而形成的仅其变量不同的多个链接确定为重复链接；A duplicate checking submodule is used to determine multiple links that are only different in variables formed by accessing the database as repeated links;

去除子模块，适于实施仅保留重复链接其中之一实现去除重复链接。Removing sub-modules is suitable for implementing only one of the repeated links is retained to achieve the removal of repeated links.

根据本发明另一实施例所揭示，所述查新单元包括：According to another embodiment of the present invention, the novelty checking unit includes:

查重子模块，用于将具有相同签名的多个链接确定为重复链接；Duplicate checking sub-module, used to determine multiple links with the same signature as duplicate links;

根据本发明实施例之一所揭示，该装置还包括设定单元，用于显示图形用户界面以接收用户设定，由此而预先给定所述已知特定网站和/或其新链接。较佳的，所述图形用户界面所接收的设定的内容包括指向特定网站的域名或IP地址。According to one disclosed embodiment of the present invention, the device further includes a setting unit, configured to display a graphical user interface to receive user settings, thereby presetting the known specific website and/or its new link. Preferably, the set content received by the graphical user interface includes a domain name or an IP address pointing to a specific website.

根据本发明实施例之一所揭示，该装置还包括设定单元，被配置为通过确定请求包中的链接所指向的IP地址属于所述已知特定网站所指向的IP地址或其所属IP地址段而将该链接确定为属于已知特定网站的关联新链接。According to one of the disclosed embodiments of the present invention, the device further includes a setting unit configured to determine that the IP address pointed to by the link in the request packet belongs to the IP address pointed to by the known specific website or its IP address section to identify the link as belonging to a known associated new link of a particular website.

根据本发明实施例之一所揭示，该装置还包括设定单元，被配置为通过比较所述请求包中的链接的域名的注册特征信息与已知特定网站的域名的注册特征信息相同而将该链接确定为属于所述已知特定网站的关联新链接。According to one of the disclosed embodiments of the present invention, the device further includes a setting unit configured to set the The link is determined to be an associated new link belonging to the known specific website.

较佳的，该装置还包括已知特定网站列表，用于记录一个或多个所述的已知特定网站的域名和/或其相应的IP地址。Preferably, the device further includes a list of known specific websites, which is used to record the domain names and/or their corresponding IP addresses of one or more known specific websites.

进一步，所述查新单元包括：Further, the novelty checking unit includes:

提取模块，用于提取已获取的所有请求包的链接；Extraction module, used to extract the links of all the request packages that have been obtained;

去重模块，用于去除提取模块提取的链接中指向具有相同代码的网页的重复链接；The deduplication module is used to remove duplicate links pointing to webpages with the same code in the links extracted by the extraction module;

添加模块，用于确定其中的新链接，将该新链接添加至待扫描队列。The adding module is configured to determine a new link therein, and add the new link to a queue to be scanned.

根据本发明一个实施例所揭示，所述检测单元包括：According to an embodiment of the present invention, the detection unit includes:

获取单元，被配置为从用于记载所述新链接的待扫描队列中获取所述新链接；an acquiring unit configured to acquire the new link from a queue to be scanned for recording the new link;

实施单元，用于对所述新链接映射的网页实施漏洞扫描检测。The implementation unit is configured to implement vulnerability scanning and detection on the webpage mapped to the new link.

根据本发明另一实施例所揭示，所述检测单元包括：According to another disclosed embodiment of the present invention, the detection unit includes:

下载单元，用于下载所述待扫描队列中的新链接所映射的网页并添加至本地网页库；A downloading unit, configured to download the webpage mapped by the new link in the queue to be scanned and add it to the local webpage library;

实施单元，用于对依据新链接下载的网页库中的网页实施漏洞扫描检测。The implementation unit is used for implementing vulnerability scanning and detection on the webpages in the webpage library downloaded according to the new link.

进一步，该装置包括显示单元，用于显示图形用户界面以输出实施漏洞扫描检测的结果信息。Further, the device includes a display unit for displaying a graphical user interface for outputting result information of vulnerability scanning and detection.

相较于现有技术，本发明至少具有如下优点：Compared with the prior art, the present invention has at least the following advantages:

1、本发明通过远程端口实现包含超文本传输协议请求包的采集数据，可以实现类似C/S的远程分布式连接架构，因此可以接收向已知特定网站服务器发起的请求所产生的相应请求包，用于针对已知特定网站及其关联链接的网页实施有针对性的漏洞扫描检测。本发明一来能够明确针对已知特定网站筛选新链接实施扫描，二来可以通过远程端口实时接收采集数据，而实时获取而确定的新链接的数量相对于所有已知特定网站的链接数量而言是极小的，通常非新链接已经在历史使用过程中已经被扫描，不必重复扫描，而对这些新链接实施的漏洞扫描的运算量较低，对服务器造成的响应压力也非常小，因此，本发明为实时扫描特定网站新上线的新链接所指向的网页的漏洞提供了技术条件，避免出现定时或不定时扫描所形成的时间空档期间造成的漏扫描而可能出现的安全事故，为网络管理者提供了更为有效的漏洞检测技术工具。1. The present invention realizes the acquisition data including the hypertext transfer protocol request packet through the remote port, and can realize a remote distributed connection architecture similar to C/S, so it can receive the corresponding request packet generated by the request initiated by the known specific website server , for targeted vulnerability scanning detections against web pages known to be linked to a specific website and its associated links. Firstly, the present invention can clearly screen and scan new links for known specific websites, and secondly, it can receive and collect data in real time through remote ports, and the number of new links determined by real-time acquisition is relative to the number of links of all known specific websites It is extremely small, usually non-new links have already been scanned in the historical use process, and there is no need to repeat the scan, and the vulnerability scanning for these new links has a low computational load, and the response pressure on the server is also very small. Therefore, The present invention provides technical conditions for real-time scanning of loopholes in webpages pointed to by newly launched new links on specific websites, and avoids possible safety accidents caused by missed scanning during the time gap formed by regular or irregular scanning, and provides network Managers provide more effective technical tools for vulnerability detection.

2、本发明进一步通过限定采集数据的来源IP地址与请求包所包含的来源IP地址或目的IP地址相同的关系，实现对采集数据来源的控制，前者适用于安装到开发人员的网页处，在网页调试阶段获取开发人员发起的请求来实现更及时的网页漏洞扫描，后者适用于安装到提供相关网页服务的服务器处，同理也可以在第一时间将外部发起的所有请求包及时捕获。对于一个网站而言，其上线新链接的首次访问请求一般由网管员出于调试的需要而发起，即使未由网管员调试，当其上传到服务器后，首次被访问也必须基于一个针对该新链接所指向的网页而向该服务发起的请求包，而本发明通过上述的限定，所获取的请求包正是来源于服务器或者网管员用于调试的客户端处，是该网页被访问的必经之路，因而，本发明能够在绝大数情况下获取针对最新的网页的请求包，理论上可以涵盖所有的网页，包括孤岛网页在内。然而最终进行漏洞扫描的又只是这些请求包中属于新链接的部分。因此，本发明可以避免现有技术中每次均需全量检测以避免漏扫描的弊端，从而通过更为轻量的方式，实现了全面的安全扫描效果。2. The present invention further controls the source of collected data by limiting the same relationship between the source IP address of the collected data and the source IP address or destination IP address contained in the request packet. In the webpage debugging stage, requests initiated by developers are obtained to realize more timely webpage vulnerability scanning. The latter is suitable for installation on servers that provide relevant webpage services. Similarly, all request packets initiated from the outside can also be captured in a timely manner at the first time. For a website, the first access request for a new link on the line is generally initiated by the network administrator for debugging purposes. Even if it is not debugged by the network administrator, when it is uploaded to the server, the first access must be based on a request for the new link. The request packet initiated to the service by the webpage pointed to by the link, and through the above-mentioned limitation in the present invention, the obtained request packet comes from the server or the client used by the network administrator for debugging, and is necessary for the webpage to be accessed. Therefore, the present invention can obtain the request packet aiming at the latest webpage in most cases, which can theoretically cover all webpages, including isolated island webpages. However, only the part of these request packets that belong to the new link is finally scanned for vulnerabilities. Therefore, the present invention can avoid the drawbacks in the prior art that a full amount of detection is required each time to avoid missed scanning, thereby achieving a comprehensive security scanning effect in a lighter way.

3、本发明进一步通过去除新链接中的重复链接，减少对实质上属于同一代码的网页进行重复的扫描，对于诸如新闻网页、论坛网页之类的链接而言，进行了大大的优化，去重率非常高，进一步降低漏洞扫描时的无效运算量，提高了机器的整体运行效率。3. The present invention further reduces repeated scanning of webpages that essentially belong to the same code by removing duplicate links in new links. For links such as news webpages and forum webpages, it is greatly optimized and deduplicated. The rate is very high, which further reduces the amount of invalid calculations during vulnerability scanning and improves the overall operating efficiency of the machine.

4、本发明的请求包的来源，既可以通过在发起请求的请求方的浏览器上增加浏览器插件来获取，还可以通过在架设所述已知特定网站的服务器上安装客户端来获取等，整个实现架构非常灵活和开放，有利于进行二次开发。4. The source of the request packet of the present invention can be obtained by adding a browser plug-in to the browser of the requesting party that initiates the request, or by installing a client on the server that sets up the known specific website, etc. , the entire implementation architecture is very flexible and open, which is conducive to secondary development.

5、本发明既允许用户通过图形用户界面添加已知特定网站，又提供了由程序自身动态确定已知特定网站的方式，并且，在漏洞扫描后还能进行相应的警示，具有非常强的交互性和较为优异的人机交互效果。5. The present invention not only allows users to add known specific websites through the graphical user interface, but also provides a way for the program itself to dynamically determine known specific websites, and can also give corresponding warnings after vulnerability scanning, which has very strong interaction and excellent human-computer interaction effect.

综上所述，本发明实现了更加全面和高效的网站安全检测技术方案。In summary, the present invention realizes a more comprehensive and efficient technical solution for website security detection.

本发明附加的方面和优点将在下面的描述中部分给出，这些将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and will become apparent from the description, or may be learned by practice of the invention.

附图说明Description of drawings

本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:

图1是本发明的网站安全检测设备接入一个现有网络拓扑的示意图；Fig. 1 is the schematic diagram that the website security detection equipment of the present invention is connected to an existing network topology;

图2是由图1变化而得的一个现有网络拓扑的示意图；Fig. 2 is a schematic diagram of an existing network topology obtained by changing Fig. 1;

图3是本发明一种网络安全检测方法的一个实施例的流程示意图；Fig. 3 is a schematic flow chart of an embodiment of a network security detection method of the present invention;

图4是本发明一种网络安全检测方法的步骤S12的细分流程示意图；FIG. 4 is a schematic diagram of a subdivision process of step S12 of a network security detection method of the present invention;

图5是本发明一种网络安全检测方法的另一实施例的流程示意图；5 is a schematic flow diagram of another embodiment of a network security detection method of the present invention;

图6是本发明一种网络安全检测装置的一个实施例的原理示意图；Fig. 6 is a schematic diagram of the principle of an embodiment of a network security detection device of the present invention;

图7是本发明一种网络安全检测装置的另一实施例的原理示意图；FIG. 7 is a schematic diagram of another embodiment of a network security detection device according to the present invention;

图8是本发明一种网络安全检测装置中的查新单元的结构示意图；Fig. 8 is a schematic structural diagram of a novelty checking unit in a network security detection device of the present invention;

具体实施方式detailed description

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能解释为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

本技术领域技术人员可以理解，除非特意声明，这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是，本发明的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件，但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解，当我们称元件被“连接”或“耦接”到另一元件时，它可以直接连接或耦接到其他元件，或者也可以存在中间元件。此外，这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。Those skilled in the art will understand that unless otherwise stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the word "comprising" used in the description of the present invention refers to the presence of said features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Additionally, "connected" or "coupled" as used herein may include wireless connection or wireless coupling. The expression "and/or" used herein includes all or any elements and all combinations of one or more associated listed items.

本技术领域技术人员可以理解，除非另外定义，这里使用的所有术语(包括技术术语和科学术语)，具有与本发明所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是，诸如通用字典中定义的那些术语，应该被理解为具有与现有技术的上下文中的意义一致的意义，并且除非像这里一样被特定定义，否则不会用理想化或过于正式的含义来解释。Those skilled in the art can understand that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the same meaning as commonly understood by those of ordinary skill in the art to which this invention belongs. It should also be understood that terms, such as those defined in commonly used dictionaries, should be understood to have meanings consistent with their meaning in the context of the prior art, and unless specifically defined as herein, are not intended to be idealized or overly Formal meaning to explain.

本技术领域技术人员可以理解，这里所使用的“终端”、“终端设备”既包括无线信号接收器的设备，其仅具备无发射能力的无线信号接收器的设备，又包括接收和发射硬件的设备，其具有能够在双向通信链路上，执行双向通信的接收和发射硬件的设备。这种设备可以包括：蜂窝或其他通信设备，其具有单线路显示器或多线路显示器或没有多线路显示器的蜂窝或其他通信设备；PCS(Personal Communications Service，个人通信系统)，其可以组合语音、数据处理、传真和/或数据通信能力；PDA(PersonalDigital Assistant，个人数字助理)，其可以包括射频接收器、寻呼机、互联网/内联网访问、网络浏览器、记事本、日历和/或GPS(Global PositioningSystem，全球定位系统)接收器；常规膝上型和/或掌上型计算机或其他设备，其具有和/或包括射频接收器的常规膝上型和/或掌上型计算机或其他设备。这里所使用的“终端”、“终端设备”可以是便携式、可运输、安装在交通工具(航空、海运和/或陆地)中的，或者适合于和/或配置为在本地运行，和/或以分布形式，运行在地球和/或空间的任何其他位置运行。这里所使用的“终端”、“终端设备”还可以是通信终端、上网终端、音乐/视频播放终端，例如可以是PDA、MID(Mobile Internet Device，移动互联网设备)和/或具有音乐/视频播放功能的移动电话，也可以是智能电视、机顶盒等设备。Those skilled in the art can understand that the "terminal" and "terminal equipment" used here not only include wireless signal receiver equipment, which only has wireless signal receiver equipment without transmission capabilities, but also include receiving and transmitting hardware. A device having receiving and transmitting hardware capable of performing bi-directional communication over a bi-directional communication link. Such equipment may include: cellular or other communication equipment, which has a single-line display or a multi-line display or a cellular or other communication equipment without a multi-line display; PCS (Personal Communications Service, personal communication system), which can combine voice, data Processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant, Personal Digital Assistant), which may include RF receiver, pager, Internet/Intranet access, web browser, notepad, calendar and/or GPS (Global Positioning System , Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device having and/or including a radio frequency receiver. As used herein, a "terminal", "terminal device" may be portable, transportable, installed in a vehicle (air, sea, and/or land), or adapted and/or configured to operate locally, and/or In distributed form, the operation operates at any other location on Earth and/or in space. The "terminal" and "terminal equipment" used here can also be communication terminals, Internet terminals, music/video playback terminals, such as PDAs, MIDs (Mobile Internet Devices, mobile Internet devices) and/or with music/video playback terminals. Functional mobile phones, smart TVs, set-top boxes and other devices.

本技术领域技术人员可以理解，这里所使用的服务器、云端、远端网络设备等概念，具有等同效果，其包括但不限于计算机、网络主机、单个网络服务器、多个网络服务器集或多个服务器构成的云。在此，云由基于云计算(Cloud Computing)的大量计算机或网络服务器构成，其中，云计算是分布式计算的一种，由一群松散耦合的计算机集组成的一个超级虚拟计算机。本发明的实施例中，远端网络设备、终端设备与WNS服务器之间可通过任何通信方式实现通信，包括但不限于，基于3GPP、LTE、WIMAX的移动通信、基于TCP/IP、UDP协议的计算机网络通信以及基于蓝牙、红外传输标准的近距无线传输方式。Those skilled in the art can understand that the concepts of server, cloud, and remote network equipment used here have equivalent effects, including but not limited to computers, network hosts, single network servers, multiple network server sets, or multiple servers. Composed of clouds. Here, the cloud is composed of a large number of computers or network servers based on cloud computing (Cloud Computing), wherein cloud computing is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computer sets. In the embodiment of the present invention, the communication between the remote network equipment, the terminal equipment and the WNS server can be realized through any communication method, including but not limited to, mobile communication based on 3GPP, LTE, WIMAX, based on TCP/IP, UDP protocol Computer network communication and short-distance wireless transmission methods based on Bluetooth and infrared transmission standards.

本领域技术人员应当理解，本发明所称的“应用”、“应用程序”、“应用软件”以及类似表述的概念，是业内技术人员所公知的相同概念，是指由一系列计算机指令及相关数据资源有机构造的适于电子运行的计算机软件。除非特别指定，这种命名本身不受编程语言种类、级别，也不受其赖以运行的操作系统或平台所限制。理所当然地，此类概念也不受任何形式的终端所限制。Those skilled in the art should understand that the concepts of "application", "application program", "application software" and similar expressions referred to in the present invention are the same concepts well known to those skilled in the art, and refer to a series of computer instructions and related Computer software that is organically constructed from data resources and suitable for electronic operation. Unless otherwise specified, this naming itself is not limited by the type of programming language, level, or the operating system or platform on which it runs. Naturally, such concepts are also not limited by any form of terminal.

本发明的方法及其装置，可以通过编程实现为软件，安装到计算机设备中进行运行，从而构成一台网站检测设备。为了进一步说明本发明的各个实施例，可以先了解企业网站服务器实现的架构。每家企业可能包括一个或多个网站，每个企业网站均可以分布架设到一个到多个服务器中。一般而言，如图1所示，简单的企业网站可能将各个服务器81、82直接接入一个交换机80而提供服务，更为复杂的，如图2所示的网络拓扑中，多个服务器81、82可能分别接入不同的交换机80来提供服务。安装有本发明的软件的设备，不需直接在交换机80处采集请求包，而是通过远程端口接收其他功能模块所采集和封装的属于向已知特定网站所在的服务器发起的包含了超文本传输协议请求包的采集数据，因此，该设备不必依赖于机房，而可直接接入互联网，开放其预先协议的远程端口用于接收客户端所采集的采集数据即可。后文将述及，所述的客户端的实现形式可以包括客户端程序和浏览器插件两种，分别适用于安装在已知特定网站所在的服务器上和用于调试网页的计算机终端设备的浏览器上。The method and its device of the present invention can be implemented as software through programming, and installed in computer equipment for operation, thereby forming a website detection equipment. In order to further illustrate various embodiments of the present invention, it is possible to first understand the architecture implemented by the enterprise website server. Each enterprise may include one or more websites, and each enterprise website can be distributed and set up on one or more servers. Generally speaking, as shown in Figure 1, a simple corporate website may directly connect each server 81, 82 to a switch 80 to provide services. In a more complicated network topology as shown in Figure 2, multiple servers 81 , 82 may be respectively connected to different switches 80 to provide services. The equipment installed with the software of the present invention does not need to directly collect the request packet at the switch 80, but receives the hypertext transmission that is collected and encapsulated by other functional modules through the remote port to the server where the known specific website is located. The protocol requests the collection data of the packet, therefore, the device does not need to depend on the computer room, but can be directly connected to the Internet, and its pre-protocol remote port can be opened to receive the collection data collected by the client. As will be mentioned later, the implementation form of the client may include client programs and browser plug-ins, which are respectively applicable to browsers installed on servers where known specific websites are located and computer terminal equipment used to debug webpages superior.

图3通过步骤流程的形式揭示了本发明的一个实施例，该实施例属于对本发明的网站安全检测方法的核心技术的具体实现，包括如下步骤：Fig. 3 reveals an embodiment of the present invention through the form of step flow, and this embodiment belongs to the concrete realization of the core technology of the website security detection method of the present invention, comprises the following steps:

步骤S11、通过远程端口接收包含超文本传输协议请求包的采集数据。Step S11, receiving the collected data including the HTTP request packet through the remote port.

如前所述，本发明可以通过类似C/S分布式结构来实现对所述请求包的数据采集。具体而言，远程端口是实现了本发明的网站安全检测方法的软件(安装于一台服务主机，也即本发明的检测设备)与其客户端所协议的通信端口，利用公知的TCP/IP协议可以由本领域技术人员轻松地实现这种通信技术。客户端采集了超文本传输协议请求包之后，需要经过公网(同理也可以是局域网)发送给该软件。基于计算机通信基本原理，客户端与检测设备之间的数据通信以报文为基本数据单元来进行表达，因此，客户端采集的包含超文本传输协议请求包的采集数据，也是以报文为单位进行传输的。报文中包含了该报文的来源IP地址和要传送到的目的IP地址，该报文的来源IP地址即是客户端所在的计算机设备的IP地址，而其目的IP地址显然是该检测设备的IP地址。请求包中也同理应包含有发起该请求的来源IP地址和要传送到的目的IP地址。请求包的来源IP地址是发起请求的计算机的IP地址，而其目的IP地址是其所要访问的网站所在的服务器的IP地址。由此，检测设备只要通过比较其所接收的采集数据的报文的来源IP地址与该采集数据所包含的相应请求包中的来源IP地址和目的IP地址之间的关系，就可以相应确定该相应的请求是来自于向已知特定网站服务器发起请求的请求方客户端，还是提供该已知特定网站给该请求方的服务器。由此可以识别请求包的来源，有目的性地对该些请求包中的链接做进一步的利用。此处所称识别请求包的来源，并不要求或排除本发明在程序设计时通过代码实现，在进程运行时对采集数据的来源IP地址、请求包的来源IP地址以及请求包的目的IP地址进行技术提取和比较，本发明此处的说明仅仅是指明采集数据的来源IP地址与请求包的来源IP地址或目的IP地址之间的逻辑联系，本领域技术人员对此应当完全理解，并且可以灵活选用这些技术手段。As mentioned above, the present invention can realize the data collection of the request packet through a similar C/S distributed structure. Specifically, the remote port is the communication port between the software (installed on a service host, that is, the detection device of the present invention) and its client that implements the website security detection method of the present invention, using the known TCP/IP protocol Such communication techniques can be readily implemented by those skilled in the art. After the client collects the HTTP request packet, it needs to send it to the software through the public network (or local area network in the same way). Based on the basic principles of computer communication, the data communication between the client and the detection equipment is expressed with the message as the basic data unit. Therefore, the collected data collected by the client including the HTTP request packet is also in the message. for transmission. The message contains the source IP address of the message and the destination IP address to be transmitted. The source IP address of the message is the IP address of the computer device where the client is located, and the destination IP address is obviously the detection device's IP address. IP address. Similarly, the request packet should include the source IP address of the request and the destination IP address to be transmitted. The source IP address of the request packet is the IP address of the computer that initiates the request, and its destination IP address is the IP address of the server where the website to be accessed is located. As a result, the detection device can determine the corresponding IP address by comparing the source IP address of the data collection message received by it with the source IP address and the destination IP address in the corresponding request packet contained in the data collection data. Whether the corresponding request comes from a requester client that initiates a request to a known specific website server, or a server that provides the known specific website to the requester. In this way, the source of the request packets can be identified, and the links in these request packets can be further utilized purposefully. The source of the identification request packet referred to herein does not require or exclude that the present invention is implemented by code during program design, and the source IP address of the collected data, the source IP address of the request packet, and the destination IP address of the request packet are checked when the process is running. Technology extraction and comparison, the explanation here of the present invention only indicates the logical connection between the source IP address of the collected data and the source IP address or destination IP address of the request packet, which should be fully understood by those skilled in the art, and can be flexibly Use these techniques.

检测设备接收到的采集数据，可以仅包含利用通信协议进行封装的一个请求包或者多个请求包，可以由本领域技术人员灵活设定，尤其适合以时间间隔来设定，从而使得一次传输的采集数据所包含的请求包的数量不必相同。例如，客户端设定以每10分钟为一个时间单位，不断采集请求包后，将请求包打包成所述采集数据传输给检测设备。这个时间周期内，发起的请求或多或少，均不影响本发明的实施。The collection data received by the detection device may only contain one request packet or multiple request packets encapsulated by the communication protocol, which can be flexibly set by those skilled in the art, especially suitable for setting at time intervals, so that the collection of one transmission The number of request packets included in the data does not have to be the same. For example, the client is set to take every 10 minutes as a time unit, and after continuously collecting request packets, the request packets are packaged into the collected data and transmitted to the detection device. During this time period, more or less requests are initiated, which will not affect the implementation of the present invention.

所述超文本传输协议(HTTP)请求包，对网站访问而言，包括两种形式，即get和post请求。两种请求虽不同，但亦均属本发明的处理对象。通常而言，HTTP请求包的格式主要包括：协议、服务器域名、端口号、请求包路径、get参数名、post参数名、扩展名、目标服务器网段等。无论是get请求包还是post请求包中均包含网页的url。网页的URL，即超链接，自其域名到其页面，有约定的格式。其中，链接的末端为其指向的资源的描述，除此之外的前面部分为其路径。例如网址http://www.360.cn/test/admin.php，其中http://表征协议格式，www.360.cn为其域名，test为该网站中的目录，admin.php为指向的资源页面，http://www.360.cn/test/相对于admin.php页面而言，便是该链接的路径。而http://www.360.cn/test/admin/admin.php显然便是http://www.360.cn/test/admin.php的更深层的链接。The hypertext transfer protocol (HTTP) request packet includes two forms for website access, namely get request and post request. Although the two requests are different, they both belong to the processing object of the present invention. Generally speaking, the format of an HTTP request packet mainly includes: protocol, server domain name, port number, request packet path, get parameter name, post parameter name, extension, target server network segment, etc. Both the get request packet and the post request packet contain the URL of the webpage. The URL of a web page, that is, a hyperlink, has an agreed format from its domain name to its page. Among them, the end of the link is the description of the resource it points to, and the other part is the path. For example, the URL http://www.360.cn/test/admin.php, where http:// represents the protocol format, www.360.cn is the domain name, test is the directory in the website, and admin.php is the pointing The resource page, http://www.360.cn/test/ is the path of the link relative to the admin.php page. And http://www.360.cn/test/admin/admin.php is obviously a deeper link to http://www.360.cn/test/admin.php.

适应不同的实现形式，可以以如下多种方式任意之一或其结合来获取由客户端采集的包含HTTP请求包的采集数据：To adapt to different implementation forms, the collection data collected by the client and including HTTP request packets can be obtained in any one of the following ways or in combination:

一、在提供网站访问服务的服务器上安装用于采集向其发起网页访问请求的请求包的客户端程序采集模块。1. A client program collection module for collecting request packets for initiating a web page access request is installed on the server providing the website access service.

依据上述的分析，可以开发一采集模块，采集模块即为在服务器上运行的客户端程序进程，客户端程序安装于提供网站访问服务的服务器，尤其是安装于架设有本发明所称的已知特定网站的服务器上，在该客户端程序运行之后，所有以该服务器为目标服务器发起的访问请求，其请求包将被该客户端程序采集，客户端程序每隔一定时间(当然也可以是实时的)便将这些请求包以与检测设备中的以本发明实现的软件协议的格式形成采集数据传送给所述的远程端口，本发明通过该远程端口接收采集数据后，对其进行解析，获取其中相应的HTTP请求包。可以看出，这种情况下，外部发起的请求包的目的IP地址是本服务器的IP地址，而表达采集数据的报文的来源IP地址也是本服务器的IP地址，利用这种对应关系，检测设备便可识别出其所接收的采集数据的来源是负责对其中的请求包做出响应的服务器。According to the above-mentioned analysis, a collection module can be developed, and the collection module is the client program process running on the server. On the server of a specific website, after the client program runs, all the access requests initiated by the server as the target server will have their request packets collected by the client program. These request packets are sent to the remote port by forming the collection data in the form of the software protocol realized by the present invention in the detection equipment, after the present invention receives the collection data through the remote port, it is analyzed and obtained The corresponding HTTP request package. It can be seen that in this case, the destination IP address of the externally initiated request packet is the IP address of the server, and the source IP address of the message expressing the collected data is also the IP address of the server. Using this correspondence, the detection The device can then recognize that the source of the collected data it receives is the server responsible for responding to the request packets therein.

二、在向提供已知特定网站访问服务的服务器发起请求的设备的浏览器上安装浏览器插件。2. Install a browser plug-in on the browser of the device that initiates a request to a server that provides access to a known specific website.

同理，可以开发一浏览器插件，将其安装到用于对前述的提供已知特定网站的网页进行在线调试的计算机终端设备中，由此，一旦浏览器运行，并且利用某一链接访问某网页时，该插件便可获取该访问产生的请求包，从而参照前一方式将请求包形成采集数据通过所述远程端口发送给检测设备。检测设备获取浏览器插件的采集数据后，可以对其进行解析，获取其中相应的HTTP请求包。可以看出，这种情况下，浏览器发起的请求包的来源IP地址是其所在的计算机的IP地址，而表达采集数据的报文的来源IP地址也是该计算机的IP地址，利用这种对应关系，检测设备便可识别出其所接收的采集数据的来源是发起请求的客户端。In the same way, a browser plug-in can be developed and installed in the computer terminal device used for online debugging of the aforementioned webpages that provide known specific websites, so that once the browser is running and a certain link is used to access a certain When accessing the web page, the plug-in can obtain the request packet generated by the access, and then refer to the previous method to form the request packet into collected data and send it to the detection device through the remote port. After the detection device obtains the collected data of the browser plug-in, it can analyze it to obtain the corresponding HTTP request package. It can be seen that in this case, the source IP address of the request packet initiated by the browser is the IP address of the computer where it is located, and the source IP address of the message expressing the collected data is also the IP address of the computer. relationship, the detection device can identify that the source of the collected data it receives is the client that initiated the request.

本领域技术人员应当知晓，所述采集模块与所述浏览器插件，两者在本质上实现的均是获取请求包的功能，均为计算机程序，只是表现形式及应用细节不同而已。而关于如何利用编程获取请求包的功能，在现有技术中是已知的，本发明为说明的简便，未行详述，本领域技术人员完全可以从现有技术中获取相关知识实践之。因此也可以理解，所述采集模块也可以实现于所述作为发起请求的客户端计算机中。Those skilled in the art should know that both the acquisition module and the browser plug-in implement the function of obtaining the request packet in essence, and both are computer programs, but the expression forms and application details are different. How to use programming to obtain the function of the request packet is known in the prior art. The present invention is not described in detail for the sake of simplicity of description. Those skilled in the art can obtain relevant knowledge from the prior art and practice it. Therefore, it can also be understood that the collection module can also be implemented in the client computer that initiates the request.

上述两种不同的获取请求包的方式，是基于不同的应用需要而提出的。无论采用何种具体方式，均能借助现有技术将采集数据中的HTTP请求包提取出来，以便该些HTTP请求包能被进一步处理。The above two different ways of obtaining the request packet are proposed based on different application requirements. No matter what specific method is adopted, the HTTP request packets in the collected data can be extracted with the help of existing technologies, so that these HTTP request packets can be further processed.

步骤S12、利用所述请求包所包含的链接确定属于已知特定网站的关联新链接。Step S12, using the links included in the request packet to determine new associated links belonging to known specific websites.

本发明所针对的网站是特定的，一般是应用本发明的方法的企业自身的一个或多个已知网站，这些网站拥有一些共同特征，其链接均解释到特定的一些IP地址段上、其域名所有人均为该企业或该企业的客户，或者，是该企业参与管理的目标网站。更具体而言，这种特定关系，是指以本方法实现的软件所需关注的网站。而是否属于该软件所需关注的网站，在技术层面上，是以本发明的方法进行判断的，具体既可以提供界面人为设定，也可以是以链接和/或IP地址和/或域名注册特征信息为基础进行综合判断。因此，本发明的已知特定网站的识别依据，不能仅仅理解为某个域名或其IP地址，还应包括虽未进行人为明文设定，但实质上是该企业所要纳入的检测对象，包括任何解析到实质上属于部分已知特定网站已经占据的IP地址的新增域名的链接。The website that the present invention is aimed at is specific, and generally is one or more known websites of the enterprise self that applies the method of the present invention, and these websites have some common features, and its links all explain on some specific IP address segments The owner of the domain name is the enterprise or the customer of the enterprise, or is the target website that the enterprise participates in the management of. More specifically, this specific relationship refers to the website that the software implemented by this method needs to pay attention to. On the technical level, whether it belongs to the website that the software needs to pay attention to is judged by the method of the present invention. Specifically, the interface can be artificially set, or it can be registered with a link and/or IP address and/or domain name Comprehensive judgment based on feature information. Therefore, the identification basis of the known specific website in the present invention should not only be understood as a certain domain name or its IP address, but should also include detection objects that are not artificially set in clear text, but are essentially included in the enterprise, including any Links that resolve to newly added domains that essentially belong to some of the IP addresses known to already be occupied by a particular website.

由此可知，相对于爬虫技术，本发明虽不需精心挑选种子URL，但有必要提供有关一些特定网站的基础设置，以设定本发明的已知特定网站。相应于前述说明，设定这些已知特定网站的方式也是多种多样的。给出已知特定网站的过程，不管给出的内容是IP地址还是域名之类的资源定位符，在本质上都是给出网站的链接，因此这个过程本质上也是确定本发明的新链接的过程。以下进一步揭示本发明用于确定已知特定网站和/或其新链接的几种具体方法：It can be seen that, compared with the crawler technology, although the present invention does not need to carefully select the seed URL, it is necessary to provide basic settings about some specific websites to set the known specific websites of the present invention. Corresponding to the foregoing description, there are various ways of setting these known specific websites. Given the process of a known specific website, no matter whether the given content is a resource locator such as an IP address or a domain name, it is essentially a link to the website, so this process is also essentially a process for determining the new link of the present invention process. The following further discloses several specific methods of the present invention for determining known specific websites and/or new links thereof:

一、利用图形用户界面设置已知特定网站和/或其关联新链接。1. Setting a known specific website and/or its associated new link by using a graphical user interface.

具体而言，以本发明实现的软件在首次运行时，将提供一图形用户界面，用于提供给用户进行部分已知特定网站的设定，用户通过向该图形用户界面输入与这些已知特定网站有关的内容而完成设定，从而预先给定一个或多个已知特定网站。这些预先给定的内容，既可以是一个或多个域名，例如so.com、360.cn等，也可以是与服务器相对应的IP地址，以及由IP地址构成的连续IP地址段或离散IP地址段区间。这些设置内容，如前所述，本质上可以被理解为一个关联新链接，可以被存储于一个已知特定网站列表中，以便本方法的后续调用。需要指出的是，这个已知特定网站列表，实质上也相当于一个链接库，因此，可以被视为链接库进行后续利用，或者将之视为链接库的数据来源。这里所称的链接库，类似于爬虫技术，可以被直接用作后续的待扫描队列，也可以仅仅是为后续的待扫描队列提供基础数据。因此可知，在这个基础上，这些用于确定部分已知特定网站的域名或者IP地址以及相关信息，便构成了本发明的新链接，或者至少可用于构造本发明的新链接，成为本发明的软件首次实施扫描的处理对象。而在后续维护时利用这种方式来继续添加新链接，当该新链接的域名不同于其它已知特定网站域名时，实质上也就是通过扩展更多域名而添加了新的已知特定网站。Specifically, when the software realized by the present invention runs for the first time, it will provide a graphical user interface for providing the user with the setting of some known specific websites. The setting is completed based on the content related to the website, thereby pre-determining one or more known specific websites. The predetermined content can be one or more domain names, such as so.com, 360.cn, etc., or the IP address corresponding to the server, as well as continuous IP address segments or discrete IP addresses composed of IP addresses. address range. These settings, as mentioned above, can be understood as an associated new link in essence, which can be stored in a list of known specific websites for subsequent calls of this method. It should be pointed out that this list of known specific websites is essentially equivalent to a link library, so it can be regarded as a link library for subsequent use, or as a data source of the link library. The link library referred to here is similar to crawler technology, and can be directly used as a subsequent queue to be scanned, or simply provide basic data for a subsequent queue to be scanned. Therefore, it can be seen that on this basis, these domain names or IP addresses and related information used to determine some known specific websites constitute a new link of the present invention, or at least can be used to construct a new link of the present invention, and become a new link of the present invention. The object that the software scans for the first time. In the follow-up maintenance, this method is used to continue to add new links. When the domain name of the new link is different from other known specific website domain names, it is essentially adding a new known specific website by expanding more domain names.

二、利用域名注册信息确定已知特定网站的关联新链接。2. Using domain name registration information to determine associated new links to known specific websites.

已知特定网站的关联新链接，包括属于已经登记的网站(可以通过包含已登记域名识别)之下的所有链接和/或域名未经登记的网站的所有链接。对于后者，是指本步骤从所述请求包获得的链接，包含新域名，不属于目前已存在的已知特定网站的链接范围时，无法确定该链接是否属于企业自有网站、是否需要视为属于已知特定网站的关联新链接时，需要通过技术手段进一步确定是否应将其视为已知特定网站的关联新链接。因此，可以通过调用域名注册网站提供的接口，来对这个链接中的新域名进行查询，确定其注册特征信息，具体包括例如域名所有人、域名备案号等，这些注册特征信息是否与目前存在的已知特定网站域名的注册特征信息相同，当两者相同时，则将该新链接视为已知特定网站的关联新链接，在本方法中运用；否则丢弃该请求包不予处理。继而可直接将该新域名和/或下层新链接添加到一个如前所述的已知特定网站列表中备用。显然，查询新域名注册特征信息的操作，既可以是人为的，也可以是利用软件实现的。当为前者时，实际上是对前述第一种方式的后续维护。当为后者时，则使本发明实现了对已知特定网站列表的动态扩展维护。如果该已知特定网站列表即为所述链接库或所述的待扫描队列，则本质上便是在维护一个新链接列表，该新链接列表自然可用做本发明后文所需的多个相关处理环节的数据基础。Known associated new links for a particular website, including all links belonging to a website that is already registered (which can be identified by including a registered domain name) and/or all links to a website whose domain name is not registered. For the latter, it refers to the link obtained from the request package in this step, including the new domain name, which does not belong to the range of existing known specific website links. When it is a new link associated with a known specific website, it needs to be further determined through technical means whether it should be regarded as a new link associated with a known specific website. Therefore, by calling the interface provided by the domain name registration website, you can query the new domain name in this link to determine its registration feature information, including, for example, the owner of the domain name, the domain name record number, etc., and whether the registration feature information is consistent with the existing It is known that the registration characteristic information of the domain name of a specific website is the same, and if the two are the same, the new link is regarded as a new link related to the known specific website and used in this method; otherwise, the request packet is discarded and not processed. Then the new domain name and/or the new lower-level link can be directly added to a list of known specific websites as mentioned above for future use. Apparently, the operation of querying the characteristic information of new domain name registration can be done manually or by using software. When it is the former, it is actually a follow-up maintenance of the aforementioned first method. When it is the latter, the present invention realizes the dynamic expansion and maintenance of the list of known specific websites. If this known list of specific websites is the link library or the queue to be scanned, it is essentially maintaining a new link list, which can naturally be used as a plurality of correlations required by the present invention later on. The data base of the processing link.

三、利用IP地址来动态确定已知特定网站的关联新链接。3. Use the IP address to dynamically determine the associated new link of the known specific website.

众所周知的，域名与IP地址之间具有映射关系。因此，通过已知的域名可以确定相应的IP地址，同一个网站可能由多个IP地址所指向的服务器提供服务，因此，网站与IP地址间可能存在一对多、多对多的映射关系。实践中，企业网站通常使用由连续IP地址构成的IP地址段来架设其服务器。有鉴于此，利用目前已经存在的已知特定网站，可以确定其所占据的IP地址段。当请求包的链接中的新域名包含不属于目前已经存在的已知特定网站的域名之一时，这时可以比较该新域名所指向的IP地址是否属于目前已经存在的已知特定网站所占有的IP地址之一，如果是，则同理可将该请求包的所述链接视为新的已知特定网站关联新链接而添加到一个如前所述的已知特定网站列表中。同理，如果该已知特定网站列表即为所述链接库或所述的待扫描队列，本处理方式本质上便是在维护一个新链接列表，该新链接列表自然可用做本发明后文所需的多个相关处理环节的数据基础。As we all know, there is a mapping relationship between domain names and IP addresses. Therefore, the corresponding IP address can be determined through a known domain name, and the same website may be served by servers pointed to by multiple IP addresses. Therefore, there may be one-to-many, many-to-many mapping relationships between websites and IP addresses. In practice, corporate websites usually use IP address segments composed of consecutive IP addresses to set up their servers. In view of this, it is possible to determine the IP address segment occupied by the existing known specific website. When the new domain name in the link of the request packet contains one of the domain names that do not belong to the existing known specific website, then it can be compared whether the IP address pointed to by the new domain name belongs to the existing known specific website. If it is one of the IP addresses, then similarly, the link of the request packet can be regarded as a new link associated with a new known specific website and added to a list of known specific websites as described above. In the same way, if the list of known specific websites is the link library or the queue to be scanned, this processing method is essentially maintaining a new link list, which can naturally be used as the link list described later in the present invention. The data basis of multiple related processing links required.

由此可知，本发明区别于爬虫技术的重点之一，在于本发明具有确定的已知特定网站，并且，这些已知特定网站，既可以初始化人为给定，也可以由以本方法实现的软件动态识别添加，而不必有如爬虫技术般严格依赖于种子URL。而且，这些已知特定网站在本质上是一系列的链接，既可以使用一个列表进行独立维护，也可以将这个列表用作链接库，甚至直接将这个列表用作待扫描队列。具体如何利用这一列表，只是数据库技术在本方法中的灵活结合运用，对本领域技术人员而言是显而易见的。例如，一种方式中，已知特定网站列表本质上即是本发明的待扫描队列，对于新链接，顺序追加到列表并附上相应的表征未扫描的标识即可，扫描后更改这些标识为表征已扫描的描述。另一种方式，该列表是独立的，主要用于记录各个域名和相应的IP地址，而另外设置待扫描队列，当识别出关联新链接时，新链接的域名将被添加到该列表中，而新链接本身则被添加到待扫描队列中，以后凡是包含此域名的链接也均不必再行解析，而直接将其添加到待扫描队列中。再一种方式，已知特定网站列表、链接库、待扫描队列均是相互独立的，已知特定网站列表仅已知特定网站有关的存储域名，该链接库用于存储所有已经识别的与已知特定网站有关的链接，而待扫描队列仅用于存储从链接库中获得的新链接，这种方式保证了各类型数据的独立性，可用做更为复杂的用途。It can be seen from this that one of the key points that the present invention is different from crawler technology is that the present invention has certain known specific websites, and these known specific websites can either be initialized artificially or by software implemented by this method. Dynamically identify and add, without strictly relying on the seed URL like crawler technology. Moreover, these known specific websites are essentially a series of links, and a list can be used for independent maintenance, or the list can be used as a link library, or even directly used as a queue to be scanned. How to use this list specifically is only the flexible combined use of database technology in this method, which is obvious to those skilled in the art. For example, in one mode, the list of known specific websites is essentially the queue to be scanned in the present invention. For new links, it is sufficient to add them to the list in order and attach corresponding unscanned signs. After scanning, change these signs to Characterizes the scanned description. In another way, the list is independent and is mainly used to record each domain name and corresponding IP address, while setting up a queue to be scanned. When a new link is identified, the domain name of the new link will be added to the list. The new link itself is added to the queue to be scanned, and in the future, all links containing this domain name do not need to be parsed, but are directly added to the queue to be scanned. In another way, the list of known specific websites, the link library, and the queue to be scanned are all independent of each other, and the list of known specific websites only knows the relevant storage domain names of the specific website, and the link library is used to store all identified and already identified Links related to a specific website are known, and the queue to be scanned is only used to store new links obtained from the link library. This method ensures the independence of various types of data and can be used for more complex purposes.

如前所述，以上三种方式任意之一，不仅可以用于确定本发明的已知特定网站，而且，本质上也是本发明用于确定是否属于已知特定网站的关联新链接的过程。为了简化后续的说明和理解，有必要交待，以下的描述中，按照前文的一种方式，将上述的已知特定网站列表完全等同于本发明后文揭示的待扫描队列。但这种简化应足以让本领域技术人员将其扩展到包括利用链接库保存有效链接的应用场景中。As mentioned above, any one of the above three methods can not only be used to determine the known specific website of the present invention, but also is essentially the process of the present invention for determining whether it belongs to the associated new link of the known specific website. In order to simplify the subsequent description and understanding, it is necessary to explain that in the following description, according to the foregoing method, the above-mentioned list of known specific websites is completely equivalent to the queue to be scanned disclosed later in the present invention. However, this simplification should be sufficient for those skilled in the art to extend it to application scenarios including using a link library to save valid links.

在经过上述内容的揭示后，理解了本发明的已知特定网站的概念，本领域技术人员应当足以实施本步骤。进一步，上述给出了已知特定网站的确定以及属于已知特定网站的关联新链接的判定方法之后，将更有助于本领域技术人员对本步骤的更深入的实施例的理解。以上两个层次实际上给出了本步骤的两个不同层次的变例，因此，利用所述请求包所包含的链接，并确定属于已知特定网站的关联新链接，这一技术手段的实施已经获得充分的公开。After the disclosure of the above content, those skilled in the art should be sufficient to implement this step after understanding the concept of the known specific website of the present invention. Further, after the determination of the known specific website and the determination method of the associated new link belonging to the known specific website are given above, it will be more helpful for those skilled in the art to understand more in-depth embodiments of this step. The above two levels actually give two different levels of variations of this step. Therefore, the implementation of this technical means utilizes the links contained in the request package and determines new associated links belonging to known specific websites. has been fully disclosed.

为了进一步体现发明的优越性，如下进一步揭示本步骤的细分步骤，来体现依据本步骤实现的另一实施例。请参阅图4，本步骤的细分步骤包括：In order to further reflect the superiority of the invention, the subdivision steps of this step are further disclosed as follows to embody another embodiment realized according to this step. Please refer to Figure 4, the subdivision steps of this step include:

步骤S121、提取已获取的所有请求包的链接。Step S121, extracting the links of all the obtained request packets.

由本方法实现的软件，汇总所有的获得的请求包之后，对请求包进行链接提取。由于http请求包中包含了网页的url，对应地，从http请求包中可以还原得到相应的链接，即网页的url。可以先行对这些链接进行公知的一些技术分析，如分析其是否有效链接。The software realized by this method, after summarizing all the obtained request packets, extracts the links of the request packets. Since the http request packet contains the url of the web page, correspondingly, the corresponding link, that is, the url of the web page, can be restored from the http request packet. Some well-known technical analysis can be performed on these links in advance, such as analyzing whether they are valid links.

有效链接指能够正常打开网页或下载文件的链接。无效链接指页面已经无效，无法对用户提供任何有价值信息的页面。当某一链接出现无域名、域名不全、链接不完整、post协议数据包没内容等现象时则将该链接判定为无效链接。以域名为abcd.com的某一链接为例，若链接中没有出现域名abcd.com或只出现域名的一部分如ad.com，则该链接为无效链接。A valid link refers to a link that can normally open a web page or download a file. Invalid links refer to pages that are invalid and cannot provide any valuable information to users. When a link has no domain name, incomplete domain name, incomplete link, or no content in the post protocol packet, the link will be judged as an invalid link. Take a link with the domain name abcd.com as an example, if the domain name abcd.com does not appear in the link or only a part of the domain name such as ad.com appears in the link, then the link is invalid.

对从请求包中获取的链接进行分析，判定该链接是否为有效链接，若链接出现无域名、域名不全、链接不完整、post协议数据包没内容等现象则判定链接为无效链接，无效链接不参与后续的处理；若否则为有效链接，后续只处理有效链接。Analyze the link obtained from the request packet to determine whether the link is a valid link. If the link has no domain name, incomplete domain name, incomplete link, or no content in the post protocol data package, the link is determined to be an invalid link, and the invalid link is not Participate in subsequent processing; otherwise, it is a valid link, and only valid links will be processed later.

步骤S122、去除所提取的链接中指向具有相同代码的网页的重复链接。Step S122, removing duplicate links pointing to webpages with the same code among the extracted links.

每条所提取的链接，主要是指其中的有效链接，本质上均指向相应的已知特定网站的一个网页，但是，这些有效链接中还可能存在大量的重复链接。所谓重复链接，是指这些链接，指向的网页是具有相同代码的网页，只是提供给原始网页以不同的数据库访问变量，而导致网页在链接内容上呈现出不同，但这些网页的漏洞点是完全相同的。Each of the extracted links mainly refers to valid links among them, which essentially point to a webpage of a corresponding known specific website, however, there may also be a large number of repeated links among these valid links. The so-called duplicate links refer to these links, which point to webpages with the same code, but provide different database access variables for the original webpage, resulting in different webpages in link content, but the loopholes of these webpages are completely identical.

例如，两条有效链接，彼此开头部分相同，而末尾处分别为/a.php？＝1与/a.php？＝2，这两条链接事实上仅仅是从数据库里提取的数据不同而已，其中“1”、“2”可以视为变量，故而两条链接的不同实际上只是变量不同，这种情况下，利用其中任意一条链接即可指向其它链接所指向的网页，因此，只需保留其中一条链接即可。进一步，可以去除其尾部变量，直接将链接的末尾处改成/a.php，而删除所有带变量的相关链接，也可起到相同的效果。这种重复链接网页多见于论坛中。For example, two valid links that start with each other and end with /a.php? =1 and /a.php? =2, these two links are actually only different in the data extracted from the database, among which "1" and "2" can be regarded as variables, so the difference between the two links is actually only the difference in variables, in this case, Any one of these links will point to the page that the other links point to, so only one link needs to be kept. Further, you can remove the tail variable, directly change the end of the link to /a.php, and delete all related links with variables, which can also achieve the same effect. This kind of repeated link pages is more common in forums.

又如，新闻网站中的网页末尾处常见/data/2011201与/data/2011202这样的链接描述，其中2011201和2011202同理应视为变量，除了这两个变量不同之外，两条链接的其余文字均相同，因此，本质上也是指向具有相同代码的网页的两条重复链接。As another example, link descriptions such as /data/2011201 and /data/2011202 are common at the end of webpages on news websites, and 2011201 and 2011202 should be regarded as variables in the same way. Except for these two variables, the rest of the text of the two links are the same, so it's essentially two duplicate links to a page with the same code.

为了提高本发明的运算效率，本领域技术人员应当借助包括公知技术在内的手段为提取出的链接去除其中的重复链接。为了更有助于本领域技术人员实施本发明，以下列出两种可选的或者并用的由本发明创新的去除重复链接的方法供参照实施：In order to improve the calculation efficiency of the present invention, those skilled in the art should remove duplicate links from the extracted links by means including known techniques. In order to be more helpful to those skilled in the art in implementing the present invention, the following two optional or combined methods for removing duplicate links innovated by the present invention are listed for reference and implementation:

方法一：先对链接进行排序，取相邻链接进行比较分析，当发现各链接仅变量不同其余内容完全相同时，将其确定为是因为访问数据库而形成的仅其变量不同的多个链接，因而确定为重复链接，这种情况下，仅保留诸多重复链接中的一条，其余全部删除，以去除重复链接。Method 1: sort the links first, and compare and analyze adjacent links. When it is found that the variables of each link are different and the rest of the content is exactly the same, it is determined that it is a plurality of links that are formed by accessing the database and only have different variables. Therefore, it is determined to be a duplicate link. In this case, only one of the many duplicate links is kept, and the rest are all deleted to remove the duplicate link.

方法二：先对链接进行排序，取相邻链接所指向的网页签名进行比较，当发现签名相同时，确定这些链接属于重复链接，仅保留其中的一条链接，删除其它链接，从而实现去除重复链接。Method 2: Sort the links first, and compare the signatures of the web pages pointed to by adjacent links. When the signatures are found to be the same, determine that these links are duplicate links, keep only one of the links, and delete the other links, so as to remove duplicate links .

上述两种方法中的排序，以及取相邻链接的手段，并非必须，本领域技术人员可以动用一切可以有助于提高比较的公知算法加以代替，恕不赘述。可以看出，通过对重复链接进行去重，所得到的链接便具有一定的唯一性网页指向，显然有助于提高后续步骤的执行效率。The sorting in the above two methods and the means of taking adjacent links are not necessary, and those skilled in the art can replace them with all known algorithms that can help to improve the comparison, and will not be described in detail. It can be seen that by removing duplicate links, the obtained links have a certain unique web page pointing, which obviously helps to improve the execution efficiency of subsequent steps.

步骤S123、确定经前一步骤处理后的链接中的关联新链接，将该新链接添加至待扫描队列。Step S123 , determining the associated new link among the links processed in the previous step, and adding the new link to the queue to be scanned.

如前所述，确定新链接的过程，其实质上也是在确定该链接是否与目前已存在的已知特定网站存在关联关系，因此而确定属于已知特定网站的关联新链接，不仅包括已经记录到已知特定网站列表(待扫描队列)中的域名、IP地址或者更具体的链接等，还包括一些其域名未出现在该列表中、而其映射的IP地址却已经被记录到该列表中或者落入该列表中已记录的IP地址所构成的IP地址段或IP地址段区间的链接。因此，在本步骤中确定关联新链接，也即对上述揭示的三种确定属于已知特定网站或属于其关联新链接的方法进行灵活运用的过程。显然，容易理解，运用上述三种方法是灵活的，可以仅选其中一种，也可同时选择任意多种。其中第一种，通过手动登记的方式，适于从中登记一个网站域名，此后所有该域名之下的未曾扫描的具体链接(如前所述可以通过在链接库中或者待扫描队列中标识状态来识别)，均视为该网站的关联新链接；其中的第二种，利用域名注册特征信息来登记，无论是通过人为查询还是程序实现，均能起到如第一种同理的效果，但在程序中实现的方式是本步骤可采用的关键，能够藉此提高程序的智能化和自动化程度；其中的第三种，通过比较请求包链接所指向的IP地址是否落入目前存在的已知特定网站列表中的链接所指向的IP地址或由其构成的连续IP地址段范围，来决定是否将该请求包链接视为属于已知特定网站的关联新链接，这种方式能够自动扩展已知特定网站列表，如果已知特定网站是单列一个列表，那么，可以将该新链接的域名添加到该列表中，而将该新链接添加到链接库(如有)和待扫描队列中；如果已知特定网站列表即同时用做待扫描队列，那么，直接添加该新链接到已知特定网站列表也便是将该新链接添加到待扫描队列的过程。As mentioned earlier, the process of determining a new link is essentially determining whether the link is associated with a known specific website that already exists. Therefore, determining a new link that belongs to a known specific website includes not only the Domain names, IP addresses or more specific links to the list of known specific websites (queue to be scanned), including some whose domain names do not appear in the list, but their mapped IP addresses have been recorded in the list Or the links that fall into the IP address segment or IP address segment range formed by the recorded IP addresses in the list. Therefore, determining the associated new link in this step is a process of flexibly using the above-mentioned three methods for determining the new link belonging to a known specific website or belonging to it. Obviously, it is easy to understand that it is flexible to use the above three methods, and you can choose only one of them, or you can choose any number of them at the same time. The first one, through manual registration, is suitable for registering a website domain name, and then all specific links under the domain name that have not been scanned (as mentioned above can be identified by identifying the status in the link library or in the queue to be scanned) identification) are regarded as new associated links of the website; the second one is to use domain name registration feature information to register, whether it is through human query or program realization, it can have the same effect as the first one, but The way to implement it in the program is the key to this step, which can improve the intelligence and automation of the program; the third one is to compare whether the IP address pointed to by the link of the request packet falls into the currently existing known The IP address pointed to by the link in the specific website list or the range of continuous IP address segments formed by it determines whether the request packet link is regarded as an associated new link belonging to a known specific website. This method can automatically expand the known Specific website list, if it is known that the specific website is a single list, then the domain name of the new link can be added to the list, and the new link is added to the link library (if any) and the queue to be scanned; if already The list of known specific websites is simultaneously used as a queue to be scanned, so directly adding the new link to the list of known specific websites is the process of adding the new link to the queue to be scanned.

借助上述揭示的几种方式任意之一对本发明的有效链接进行了有关新链接的上述过程的筛选之后，所得到便是所有新链接(必要时可以在这些新链接的基础上，利用爬虫技术，将其视为种子URL进展新链接扩展)，为了便于后续步骤的执行，将该些新链接添加到如前所述的待扫描队列中。不管该待扫描队列是否与已知特定网站列表共用一表，还是进一步与所述链接库共用一表，或者待扫描队列是单独的一表，等等，如前所述，本领域技术人员均能够利用寻常知识在该待扫描队列中登记所有确定的新链接，并在后续仅仅对该些新链接实施漏洞扫描。After the effective links of the present invention have been screened in the above-mentioned process about new links by means of any one of the several ways disclosed above, all new links are obtained (if necessary, crawler technology can be used on the basis of these new links, Think of it as seed URL progress new link expansion), in order to facilitate the execution of subsequent steps, these new links are added to the queue to be scanned as mentioned above. Regardless of whether the queue to be scanned shares a table with the list of known specific websites, or further shares a table with the link library, or the queue to be scanned is a separate table, etc., as mentioned above, those skilled in the art will Common knowledge can be used to register all certain new links in the queue to be scanned, and to perform vulnerability scanning only on these new links in the future.

步骤S13、对所述新链接相对应的网页实施漏洞扫描检测。Step S13, implementing vulnerability scanning and detection on the webpage corresponding to the new link.

在经过上述步骤灵活的多种变例进行处理，最终从所有请求包链接中确定了所有新链接之后，可以集中对这些新链接相对应的网页实施漏洞扫描检测。当然，所谓的集中，在时间上可以一般是周期性的。因为用户请求不断发生，本方法能不断获取请求包，并可不断对请求包进行分析，但不可能等到用户不再发送请求时才开始进行扫描检测。因此，本步骤与其它步骤只有逻辑上的关系，不应以这种逻辑关系排除其在时间上的穿插关系。例如，可以一边确定新链接，一边对之前已确定的新链接进行扫描。可以以一个进程不断确定接收请求包并确定新链接，将新链接存入待扫描队列，而另一进程则不断地对待扫描队列中的新链接实施扫描。不管其它步骤如何灵活变通实现，本步骤仅需关注所述待扫描队列中的新链接，同理，无论本步骤如何灵活变通实现，前述各步骤最终提供的接口也在于一个存储有新链接的待扫描队列，待扫描队列无疑成为本步骤与之前的步骤之间的接口，本领域技术人员应当知晓此一原理。After flexible multiple variants of the above-mentioned steps are processed, and finally all new links are determined from all request packet links, vulnerability scanning and detection may be implemented on the web pages corresponding to these new links. Of course, the so-called concentration can generally be periodic in time. Because user requests are constantly occurring, this method can continuously obtain and analyze request packets, but it is impossible to start scanning and detection until the user no longer sends requests. Therefore, this step has only a logical relationship with other steps, and this logical relationship should not exclude their interspersed relationship in time. For example, a previously determined new link may be scanned while determining a new link. One process can continuously determine the reception of the request packet and determine the new link, and store the new link in the queue to be scanned, while the other process continuously scans the new link in the queue to be scanned. No matter how flexible the other steps are, this step only needs to pay attention to the new links in the queue to be scanned. Similarly, no matter how flexible this step is, the interface finally provided by the aforementioned steps also lies in a waiting list that stores new links. The scanning queue and the queue to be scanned undoubtedly become the interface between this step and the previous steps, and those skilled in the art should know this principle.

本发明所称新链接相对应的网页中的对应关系，既可以是指由新链接利用域名与IP地址的关系直接映射到网站服务器中相应的网页的关系，也可以是指将该相应网页下载后存储于本地网页库中的这种间接的一一对应关系。因此，适应这两种具体的对应关系，可以采取以下两种方式任意之一对本发明确定的新链接所指向的网页进行漏洞扫描检测。The corresponding relationship in the webpage corresponding to the new link mentioned in the present invention can refer to the relationship that the new link is directly mapped to the corresponding webpage in the website server by using the relationship between the domain name and the IP address, or it can refer to the downloading of the corresponding webpage. This indirect one-to-one correspondence is stored in the local webpage database later. Therefore, adapting to these two specific correspondences, one of the following two ways can be adopted to scan and detect the vulnerability of the webpage pointed to by the new link determined in the present invention.

方式一、从所述待扫描队列中获取记载于其中的新链接，然后，利用该新链接直接映射的在线网页，通过向其网站服务器发送请求，利用网站服务器返回的网页进行漏洞扫描检测。这种方式会加大新链接所在服务器的负担和处理时间，但可以适当节省利用本方法实现的软件的运算量。Method 1: Obtain the new link recorded therein from the queue to be scanned, and then use the online webpage directly mapped by the new link to send a request to its web server, and use the web page returned by the web server to perform vulnerability scanning and detection. This method will increase the burden and processing time of the server where the new link is located, but it can appropriately save the calculation amount of the software implemented by this method.

方式二、先利用待扫描队列中记载的新链接去下载这些新链接直接映射的网页，下载方法可以同方式一，将这些网页添加至一个本地网页库中，然后对这些本地网页库中的各个网页实施漏洞扫描检测。或者也可如前所述，开设两个进程，一个用于不断下载各新链接所映射的在线网页至本地网页库，另一个则不断地对刚下载的本地网页库中的网页实施漏洞扫描检测。Method 2: Use the new links recorded in the queue to be scanned to download the webpages directly mapped by these new links. The download method can be the same as method 1. Add these webpages to a local webpage library, and then update each of these local webpage libraries. Web pages implement vulnerability scanning and detection. Or, as mentioned above, two processes can be set up, one is used to continuously download the online webpages mapped by each new link to the local webpage library, and the other continuously scans and detects the vulnerabilities of the newly downloaded webpages in the local webpage library .

按照上述方式，不管具体如何利用待扫描队列中的新链接进行漏洞扫描检测，显然，均不影响不发明所要达到的漏洞扫描检测效果。According to the above method, no matter how the new links in the queue to be scanned are used for vulnerability scanning and detection, obviously, the effect of vulnerability scanning and detection to be achieved by the non-invention will not be affected.

具体进行漏洞扫描检测时，是结合网站安全检测漏洞数据和网站安全检测规则实施的。网站安全检测漏洞数据包括以下至少之一：挂马数据、虚假欺诈数据、搜索屏蔽数据、旁注数据、篡改数据、漏洞数据。根据网站安全检测漏洞数据，按照与网站安全检测漏洞数据相对应的网站安全检测规则对网站进行安全检测，其中，网站安全检测规则包括以下至少之一：挂马规则、虚假欺诈规则、屏蔽规则、旁注规则、篡改规则、和漏洞规则。本发明主要利用漏洞规则对网页进行扫描。漏洞规则用于根据漏洞数据确定网站存在的漏洞。Specifically, when performing vulnerability scanning and detection, it is implemented in combination with website security detection vulnerability data and website security detection rules. The website security detection vulnerability data includes at least one of the following: Trojan horse data, false fraud data, search shielding data, margin data, tampering data, and vulnerability data. According to the website security detection vulnerability data, the website security detection is carried out according to the website security detection rules corresponding to the website security detection vulnerability data, wherein the website security detection rules include at least one of the following: hanging horse rules, false fraud rules, shielding rules, Aline rules, tamper rules, and loophole rules. The invention mainly utilizes loophole rules to scan webpages. Vulnerability rules are used to determine the vulnerabilities of a website based on vulnerability data.

根据漏洞数据，按照漏洞规则对网站进行安全检测包括：获取预先存储的漏洞特征数据库中的漏洞特征，判断漏洞数据是否符合漏洞特征，若漏洞数据符合漏洞特征，则确定为漏洞；若漏洞数据不符合漏洞特征，则确定为非漏洞。根据判断结果确定网站存在的漏洞，其中，漏洞特征可以为漏洞关键字。如，将网页状态代码404作为漏洞关键字；或者，将404页面内容作为漏洞关键字；或者，通过访问网站的正常网页，提取该正常网页的网页内容、网页状态代码和http头部，访问该网站不存在的网页，提取反馈网页的网页内容、网页状态代码和http头部，比较该正常网页和该反馈网页的网页内容、网页状态代码和http头部，获取404关键字作为漏洞关键字；再或者，访问不存在的网页，将反馈网页的网页内容、网页状态代码和http头部作为漏洞关键字等等，本发明对此不作限制。According to the vulnerability data, the security inspection of the website according to the vulnerability rules includes: obtaining the vulnerability characteristics in the pre-stored vulnerability characteristic database, judging whether the vulnerability data conforms to the vulnerability characteristics, if the vulnerability data conforms to the vulnerability characteristics, it is determined as a vulnerability; If it meets the characteristics of a vulnerability, it is determined to be a non-vulnerability. Vulnerabilities existing in the website are determined according to the judgment result, wherein the vulnerability feature may be a vulnerability keyword. For example, use the webpage status code 404 as the vulnerability keyword; or, use the 404 page content as the vulnerability keyword; or, by accessing the normal webpage of the website, extract the webpage content, webpage status code and http header of the normal webpage, and access the For a webpage that does not exist on the website, extract the webpage content, webpage status code and http header of the feedback webpage, compare the webpage content, webpage status code and http header of the normal webpage and the feedback webpage, and obtain the 404 keyword as the vulnerability keyword; Or, visit a webpage that does not exist, and use the webpage content, webpage status code and http header of the feedback webpage as the vulnerability keywords, etc., and the present invention does not limit this.

通过上述各步骤，本发明的方法便可完成对网站进行安全检测的任务，将漏洞扫描后的结果存储于相应的文件或数据库中，可供它用。进一步，为了取得更佳的人机交互效果，本发明还可以参照图5所揭示的实施例可选地执行如下步骤：Through the above-mentioned steps, the method of the present invention can complete the task of performing security detection on the website, and store the results of vulnerability scanning in corresponding files or databases for other uses. Further, in order to achieve a better human-computer interaction effect, the present invention can also optionally perform the following steps with reference to the embodiment disclosed in FIG. 5:

步骤S14、显示图形用户界面以输出实施漏洞扫描检测的结果信息。Step S14, displaying a graphical user interface to output the result information of vulnerability scanning and detection.

由于本方法适于以编程的方式实现，因此，可以通过该程序实现一个图形用户界面，在执行完前述步骤完成漏洞扫描检测之后，对检测结果进行分析、统计，将进行数学处理后的结果信息输出到该图形用户界面中，可以使网管员一目了然，从而便于网管员修补网页漏洞。Since this method is suitable for programming, a graphical user interface can be implemented through this program. After the above-mentioned steps are performed to complete the vulnerability scanning and detection, the detection results are analyzed and counted, and the result information after mathematical processing is processed. The output to the graphical user interface can make the network administrator clear at a glance, so that it is convenient for the network administrator to repair the loopholes in the webpage.

在详细揭示了本发明的上述方法的多种实施形式之后，以下结合模块化思维，揭示利用本发明的方法进一步实现的相应的装置的实施例，以便本领域技术人员更透彻地理解本发明。需要注意的是，本方法所采用的概念及原理，同理适用于本发明的相应的装置，故以下的描述将简化部分说明。After disclosing in detail various implementation forms of the above-mentioned method of the present invention, combined with modular thinking, embodiments of corresponding devices further realized by the method of the present invention are disclosed below, so that those skilled in the art can understand the present invention more thoroughly. It should be noted that the concepts and principles adopted in this method are similarly applicable to the corresponding device of the present invention, so the following description will be partially simplified.

请参阅图6，本发明的网站安全检测装置，配置于一台用作安全检测设备的计算机设备中，包括抓包单元11、查新单元12、检测单元13，以及如图7所示实施例所揭示可选地包括显示单元14。Please refer to Fig. 6, the website safety detection device of the present invention is configured in a computer device used as a safety detection device, including a packet capture unit 11, a newness checking unit 12, a detection unit 13, and the embodiment shown in Fig. 7 The disclosed optionally includes a display unit 14 .

所述的抓包单元11，用于通过远程端口接收包含超文本传输协议请求包的采集数据。The packet capture unit 11 is used to receive the collected data including the hypertext transfer protocol request packet through the remote port.

检测设备接收到的采集数据，可以仅包含利用通信协议进行封装的一个请求包或者多个请求包，可以由本领域技术人员灵活设定，尤其适合以时间间隔来设定，从而使得一次传输的采集数据所包含的请求包的数量不必相同。例如，客户端设定以每10分钟为一个时间单位，不断采集请求包后，将请求包打包成所述采集数据传输给检测设备。这个时间周期内，发起的请求或多或少，均不影响本发明的实施。所述超文本传输协议(HTTP)请求包，对网站访问而言，包括两种形式，即get和post请求。两种请求虽不同，但亦均属本发明的处理对象。通常而言，HTTP请求包的格式主要包括：协议、服务器域名、端口号、请求包路径、get参数名、post参数名、扩展名、目标服务器网段等。无论是get请求包还是post请求包中均包含网页的url。网页的URL，即超链接，自其域名到其页面，有约定的格式。其中，链接的末端为其指向的资源的描述，除此之外的前面部分为其路径。例如网址http://www.360.cn/test/admin.php，其中http://表征协议格式，www.360.cn为其域名，test为该网站中的目录，admin.php为指向的资源页面，http://www.360.cn/test/相对于admin.php页面而言，便是该链接的路径。而http://www.360.cn/test/admin/admin.php显然便是http://www.360.cn/test/admin.php的更深层的链接。The collection data received by the detection device may only contain one request packet or multiple request packets encapsulated by the communication protocol, which can be flexibly set by those skilled in the art, especially suitable for setting at time intervals, so that the collection of one transmission The number of request packets included in the data does not have to be the same. For example, the client is set to take every 10 minutes as a time unit, and after continuously collecting request packets, the request packets are packaged into the collected data and transmitted to the detection device. During this time period, more or less requests are initiated, which will not affect the implementation of the present invention. The hypertext transfer protocol (HTTP) request packet includes two forms for website access, namely get request and post request. Although the two requests are different, they both belong to the processing object of the present invention. Generally speaking, the format of an HTTP request packet mainly includes: protocol, server domain name, port number, request packet path, get parameter name, post parameter name, extension, target server network segment, etc. Both the get request packet and the post request packet contain the URL of the web page. The URL of a web page, that is, a hyperlink, has an agreed format from its domain name to its page. Among them, the end of the link is the description of the resource it points to, and the other part is the path. For example, the URL http://www.360.cn/test/admin.php, where http:// represents the protocol format, www.360.cn is the domain name, test is the directory in the website, and admin.php is the pointing The resource page, http://www.360.cn/test/ is the path of the link relative to the admin.php page. And http://www.360.cn/test/admin/admin.php is obviously a deeper link to http://www.360.cn/test/admin.php.

本领域技术人员应当知晓，所述采集模块与所述浏览器插件，两者在本质上实现的均是获取请求包的功能，均为计算机程序，只是表现形式及应用细节不同而已。而关于如何利用编程获取请求包的功能，在现有技术中是已知的，本发明为说明的简便，未行详述，本领域技术人员完全可以从现有技术中获取相关知识实践之。因此也可以理解，所述采集模块也可以实现于所述作为发起请求的客户端计算机中。上述两种不同的获取请求包的方式，是基于不同的应用需要而提出的。无论采用何种具体方式，均能借助现有技术将采集数据中的HTTP请求包提取出来，以便该些HTTP请求包能被进一步处理。Those skilled in the art should know that both the acquisition module and the browser plug-in implement the function of obtaining the request packet in essence, and both are computer programs, but the expression forms and application details are different. How to use programming to obtain the function of the request packet is known in the prior art. The present invention is not described in detail for the sake of simplicity of description. Those skilled in the art can obtain relevant knowledge from the prior art and practice it. Therefore, it can also be understood that the collection module can also be implemented in the client computer that initiates the request. The above two different ways of obtaining the request packet are proposed based on different application requirements. No matter what specific method is adopted, the HTTP request packets in the collected data can be extracted with the help of existing technologies, so that these HTTP request packets can be further processed.

所述的查新单元12，适于利用所述请求包所包含的链接确定属于已知特定网站的关联新链接。The novelty checking unit 12 is adapted to use the links included in the request packet to determine associated new links belonging to known specific websites.

本发明所针对的网站是特定的，一般是应用本发明的装置的企业自身的一个或多个已知网站，这些网站拥有一些共同特征，其链接均解释到特定的一些IP地址段上、其域名所有人均为该企业或该企业的客户，或者，是该企业参与管理的目标网站。更具体而言，这种特定关系，是指实现了本装置的软件所需关注的网站。而是否属于该软件所需关注的网站，在技术层面上，是由本发明的装置进行判断的，具体既可以提供界面人为设定，也可以是以链接和/或IP地址和/或域名注册特征信息为基础进行综合判断。因此，本发明的已知特定网站的识别依据，不能仅仅理解为某个域名或其IP地址，还应包括虽未进行人为明文设定，但实质上是该企业所要纳入的检测对象，包括任何解析到实质上属于部分已知特定网站已经占据的IP地址的新增域名的链接。The website that the present invention is aimed at is specific, is generally one or more known websites of the enterprise self that applies the device of the present invention, and these websites have some common characteristics, and its link all explains on some specific IP address segments, its The owner of the domain name is the enterprise or the customer of the enterprise, or is the target website that the enterprise participates in the management of. More specifically, this specific relationship refers to the website that the software that realizes this device needs to pay attention to. And whether it belongs to the website that the software needs to pay attention to is judged by the device of the present invention on a technical level. Specifically, the interface can be provided artificially, or it can be based on the link and/or IP address and/or domain name registration feature. Make comprehensive judgments based on information. Therefore, the identification basis of the known specific website in the present invention should not only be understood as a certain domain name or its IP address, but should also include detection objects that are not artificially set in clear text, but are essentially included in the enterprise, including any Links that resolve to newly added domains that essentially belong to some of the IP addresses known to already be occupied by a particular website.

由此可知，相对于爬虫技术，本发明虽不需精心挑选种子URL，但有必要通过一设定单元120(参阅图8)提供有关一些特定网站的基础设置，以设定本发明的已知特定网站。相应于前述说明，设定这些已知特定网站的方式也是多种多样的。给出已知特定网站的过程，不管给出的内容是IP地址还是域名之类的资源定位符，在本质上都是给出网站的链接，因此这个过程本质上也是确定本发明的新链接的过程。以下进一步揭示本发明用于确定已知特定网站和/或其新链接的设定单元120的几种具体实施例：It can be seen that, relative to crawler technology, although the present invention does not need to carefully select the seed URL, it is necessary to provide basic settings about some specific websites through a setting unit 120 (referring to FIG. specific site. Corresponding to the foregoing description, there are various ways of setting these known specific websites. Given the process of a known specific website, no matter whether the given content is a resource locator such as an IP address or a domain name, it is essentially a link to the website, so this process is also essentially a process for determining the new link of the present invention process. Several specific embodiments of the setting unit 120 of the present invention for determining a known specific website and/or its new link are further disclosed as follows:

一、所述设定单元120，可被配置为利用图形用户界面设置已知特定网站和/或其关联新链接。1. The setting unit 120 may be configured to use a graphical user interface to set a known specific website and/or its associated new link.

具体而言，以本发明实现的软件在首次运行时，将通过本设定单元120提供一图形用户界面，用于提供给用户进行部分已知特定网站的设定，用户通过向该图形用户界面输入与这些已知特定网站有关的内容而完成设定，从而预先给定一个或多个已知特定网站。这些预先给定的内容，既可以是一个或多个域名，例如so.com、360.cn等，也可以是与服务器相对应的IP地址，以及由IP地址构成的连续IP地址段或离散IP地址段区间。这些设置内容，如前所述，本质上可以被理解为一个关联新链接，可以被存储于一个已知特定网站列表中，以便本装置的其它功能模块调用。需要指出的是，这个已知特定网站列表，实质上也相当于一个链接库，因此，可以被视为链接库进行后续利用，或者将之视为链接库的数据来源。这里所称的链接库，类似于爬虫技术，可以被直接用作后续的待扫描队列，也可以仅仅是为后续的待扫描队列提供基础数据。因此可知，在这个基础上，这些用于确定部分已知特定网站的域名或者IP地址以及相关信息，便构成了本发明的新链接，或者至少可用于构造本发明的新链接，成为本发明的软件首次实施扫描的处理对象。而在后续维护时利用这种方式来继续添加新链接，当该新链接的域名不同于其它已知特定网站域名时，实质上也就是通过扩展更多域名而添加了新的已知特定网站。Specifically, when the software realized by the present invention runs for the first time, it will provide a graphical user interface through the setting unit 120, which is used to provide the user with the setting of some known specific websites. The setting is completed by inputting content related to these known specific websites, so that one or more known specific websites are preset. The predetermined content can be one or more domain names, such as so.com, 360.cn, etc., or the IP address corresponding to the server, as well as continuous IP address segments or discrete IP addresses composed of IP addresses. address range. These setting contents, as mentioned above, can be understood as an associated new link in essence, which can be stored in a list of known specific websites so as to be called by other function modules of the device. It should be pointed out that this list of known specific websites is essentially equivalent to a link library, so it can be regarded as a link library for subsequent use, or as a data source of the link library. The link library referred to here is similar to crawler technology, and can be directly used as a subsequent queue to be scanned, or simply provide basic data for a subsequent queue to be scanned. Therefore, it can be seen that on this basis, these domain names or IP addresses and related information used to determine some known specific websites constitute a new link of the present invention, or at least can be used to construct a new link of the present invention, and become a new link of the present invention. The object that the software scans for the first time. In the follow-up maintenance, this method is used to continue to add new links. When the domain name of the new link is different from other known specific website domain names, it is essentially adding a new known specific website by expanding more domain names.

二、所述的设定单元120，可被配置为利用域名注册信息确定已知特定网站的关联新链接。2. The setting unit 120 may be configured to use domain name registration information to determine a new associated link of a known specific website.

已知特定网站的关联新链接，包括属于已经登记的网站(可以通过包含已登记域名识别)之下的所有链接和/或域名未经登记的网站的所有链接。对于后者，是指从所述请求包获得的链接，包含新域名，不属于目前已存在的已知特定网站的链接范围时，无法确定该链接是否属于企业自有网站、是否需要视为属于已知特定网站的关联新链接时，需要通过技术手段进一步确定是否应将其视为已知特定网站的关联新链接。因此，可以通过调用域名注册网站提供的接口，来对这个链接中的新域名进行查询，确定其注册特征信息，具体包括例如域名所有人、域名备案号等，这些注册特征信息是否与目前存在的已知特定网站域名的注册特征信息相同，当两者相同时，则将该新链接视为已知特定网站的关联新链接，在本装置中运用；否则丢弃该请求包不予处理。继而可直接将该新域名和/或其下层新链接添加到一个如前所述的已知特定网站列表中备用。显然，查询新域名注册特征信息的操作，既可以是人为的，也可以是利用软件实现的。当为前者时，实际上是对前述第一种方式的后续维护。当为后者时，则使本发明实现了对已知特定网站列表的动态扩展维护。如果该已知特定网站列表即为所述链接库或所述的待扫描队列，则本质上便是在维护一个新链接列表，该新链接列表自然可用做本发明后文所需的多个相关处理环节的数据基础。Known associated new links for a particular website, including all links belonging to a website that is already registered (which can be identified by including a registered domain name) and/or all links to a website whose domain name is not registered. For the latter, it refers to the link obtained from the request package, including the new domain name, which does not belong to the link scope of the existing known specific website. When a new link associated with a specific website is known, it needs to be further determined through technical means whether it should be regarded as a new link associated with a known specific website. Therefore, by calling the interface provided by the domain name registration website, you can query the new domain name in this link to determine its registration feature information, including, for example, the owner of the domain name, the domain name record number, etc., and whether the registration feature information is consistent with the existing It is known that the registration characteristic information of the domain name of the specific website is the same, and if the two are the same, the new link is regarded as a new link related to the known specific website and used in the device; otherwise, the request packet is discarded and not processed. Then the new domain name and/or its subordinate new links can be directly added to a list of known specific websites as mentioned above for future use. Apparently, the operation of querying the characteristic information of new domain name registration can be done manually or by using software. When it is the former, it is actually a follow-up maintenance of the aforementioned first method. When it is the latter, the present invention realizes the dynamic expansion and maintenance of the list of known specific websites. If this known list of specific websites is the link library or the queue to be scanned, it is essentially maintaining a new link list, which can naturally be used as a plurality of correlations required by the present invention later on. The data base of the processing link.

三、所述的设定单元120，被配置为利用IP地址来动态确定已知特定网站的关联新链接。3. The setting unit 120 is configured to use the IP address to dynamically determine the associated new link of the known specific website.

由此可知，本发明区别于爬虫技术的重点之一，在于本发明具有确定的已知特定网站，并且，这些已知特定网站，既可以初始化人为给定，也可以由装配了本装置的软件动态识别添加，而不必有如爬虫技术般严格依赖于种子URL。而且，这些已知特定网站在本质上是一系列的链接，既可以使用一个列表进行独立维护，也可以将这个列表用作链接库，甚至直接将这个列表用作待扫描队列。具体如何利用这一列表，只是数据库技术在本装置中的灵活结合运用，对本领域技术人员而言是显而易见的。例如，一种方式中，已知特定网站列表本质上即是本发明的待扫描队列，对于新链接，顺序追加到列表并附上相应的表征未扫描的标识即可，扫描后更改这些标识为表征已扫描的描述。另一种方式，该列表是独立的，主要用于记录各个域名和相应的IP地址，而另外设置待扫描队列，当识别出关联新链接时，新链接的域名将被添加到该列表中，而新链接本身则被添加到待扫描队列中，以后凡是包含此域名的链接也均不必再行解析，而直接将其添加到待扫描队列中。再一种方式，已知特定网站列表、链接库、待扫描队列均是相互独立的，已知特定网站列表仅已知特定网站有关的存储域名，该链接库用于存储所有已经识别的与已知特定网站有关的链接，而待扫描队列仅用于存储从链接库中获得的新链接，这种方式保证了各类型数据的独立性，可用做更为复杂的用途。It can be seen from this that one of the key points that the present invention is different from crawler technology is that the present invention has certain known specific websites, and these known specific websites can either be initialized artificially or by software equipped with the device. Dynamically identify and add, without strictly relying on the seed URL like crawler technology. Moreover, these known specific websites are essentially a series of links, and a list can be used for independent maintenance, or the list can be used as a link library, or even directly used as a queue to be scanned. How to use this list specifically is just the flexible combined use of database technology in this device, which is obvious to those skilled in the art. For example, in one mode, the list of known specific websites is essentially the queue to be scanned in the present invention. For new links, it is sufficient to add them to the list in order and attach corresponding unscanned signs. After scanning, change these signs to Characterizes the scanned description. In another way, the list is independent and is mainly used to record each domain name and corresponding IP address, while setting up a queue to be scanned. When a new link is identified, the domain name of the new link will be added to the list. The new link itself is added to the queue to be scanned, and in the future, all links containing this domain name do not need to be parsed, but are directly added to the queue to be scanned. In another way, the list of known specific websites, the link library, and the queue to be scanned are all independent of each other, and the list of known specific websites only knows the relevant storage domain names of the specific website, and the link library is used to store all identified and already identified Links related to a specific website are known, and the queue to be scanned is only used to store new links obtained from the link library. This method ensures the independence of various types of data and can be used for more complex purposes.

如前所述，设定单元120的三种实施方式，不仅均可以用于确定本发明的已知特定网站，而且，本质上也可以用于确定属于已知特定网站的关联新链接。为了简化后续的说明和理解，有必要交待，以下的描述中，按照前文的一种方式，将上述的已知特定网站列表完全等同于本发明后文揭示的待扫描队列。但这种简化应足以让本领域技术人员将其扩展到包括利用链接库保存有效链接的应用场景中。As mentioned above, the three implementations of the setting unit 120 can not only be used to determine the known specific website of the present invention, but also essentially can be used to determine the associated new link belonging to the known specific website. In order to simplify the subsequent description and understanding, it is necessary to explain that in the following description, according to the foregoing method, the above-mentioned list of known specific websites is completely equivalent to the queue to be scanned disclosed later in the present invention. However, this simplification should be sufficient for those skilled in the art to extend it to application scenarios including using a link library to save valid links.

在经过上述内容的揭示后，理解了本发明的已知特定网站的概念，本领域技术人员应当足以实施本查新单元12。进一步，上述给出了用于确定已知特定网站以及确定属于已知特定网站的关联新链接多种设定单元120之后，将更有助于本领域技术人员对本查新单元12的更深入的实施例的理解。以上两个层次实际上给出了本查新单元12的两个不同层次的变例，因此，利用所述请求包所包含的链接，并确定属于已知特定网站的关联新链接，这一技术手段的实施已经获得充分的公开。After the disclosure of the above content, those skilled in the art should be able to implement the novelty search unit 12 after understanding the concept of the known specific website of the present invention. Further, after the various setting units 120 for determining the known specific website and the associated new link belonging to the known specific website are given above, it will be more helpful for those skilled in the art to have a deeper understanding of the novelty search unit 12. Example understanding. The above two levels actually provide two different levels of variants of the novelty search unit 12. Therefore, using the links contained in the request package and determining the associated new links belonging to the known specific website, this technology The implementation of the means has been fully disclosed.

为了进一步体现发明的优越性，如下进一步揭示本查新单元12的在另一实施例中的内部结构，来体现依据本查新单元12实现的另一实施例的细节。请参阅图8，本查新单元12进一步包括提取模块121、去重模块122以及添加模块123：In order to further demonstrate the superiority of the invention, the internal structure of the novelty search unit 12 in another embodiment is further disclosed as follows to reflect the details of another embodiment implemented according to the novelty search unit 12 . Referring to Fig. 8, the novelty checking unit 12 further includes an extracting module 121, a deduplication module 122 and an adding module 123:

所述的提取模块121，用于提取已获取的所有请求包的链接。The extraction module 121 is configured to extract links of all obtained request packets.

由本装置实现的软件，汇总所有的获得的请求包之后，由提取模块121对请求包进行链接提取。由于http请求包中包含了网页的url，对应地，从http请求包中可以还原得到相应的链接，即网页的url。可以先行对这些链接进行公知的一些技术分析，如分析其是否有效链接。The software implemented by this device collects all the obtained request packets, and then uses the extraction module 121 to perform link extraction on the request packets. Since the http request packet contains the url of the web page, correspondingly, the corresponding link, that is, the url of the web page, can be restored from the http request packet. Some well-known technical analysis can be performed on these links in advance, such as analyzing whether they are valid links.

所述的去重模块122，用于去除所提取的链接中指向具有相同代码的网页的重复链接。The deduplication module 122 is configured to remove duplicate links pointing to webpages with the same code among the extracted links.

为了提高本发明的运算效率，本领域技术人员应当借助包括公知技术在内的手段为提取出的链接去除其中的重复链接。本发明的去重模块122进一步包括查重子模块和去除子模块，前者用于确定重复链接，后者用于实施去除操作。为了更有助于本领域技术人员实施本发明，以下列出用于去除重复链接的去重模块122的具体结构的两种可选实施方式供参考：In order to improve the calculation efficiency of the present invention, those skilled in the art should remove duplicate links from the extracted links by means including known techniques. The deduplication module 122 of the present invention further includes a duplication checking submodule and a removal submodule, the former is used to determine duplicate links, and the latter is used to implement a removal operation. In order to help those skilled in the art to implement the present invention, two optional implementations of the specific structure of the deduplication module 122 for removing duplicate links are listed below for reference:

结构形式之一：所述查重子模块先对链接进行排序，取相邻链接进行比较分析，当发现各链接仅变量不同其余内容完全相同时，将其确定为是因为访问数据库而形成的仅其变量不同的多个链接，因而确定为重复链接，这种情况下，所述去除子模块仅保留诸多重复链接中的一条，其余全部删除，以去除重复链接。One of the structural forms: the repeat checking sub-module first sorts the links, and compares and analyzes the adjacent links. When it is found that the variables of each link are different and the rest of the content is completely the same, it is determined that only other links are formed due to access to the database. Multiple links with different variables are therefore determined to be repeated links. In this case, the removing submodule only retains one of the many repeated links, and deletes all the others, so as to remove repeated links.

结构形式之二：所述查重子模块先对链接进行排序，取相邻链接所指向的网页签名进行比较，当发现签名相同时，确定这些链接属于重复链接，所述去除子模块继而仅保留其中的一条链接，删除其它链接，从而实现去除重复链接。The second structural form: the repeat checking sub-module first sorts the links, compares the signatures of the webpages pointed to by adjacent links, and when the signatures are found to be the same, it is determined that these links belong to duplicate links, and the removing sub-module then only retains the One of the links, delete other links, so as to achieve the removal of duplicate links.

上述两种结构形式中的排序，以及取相邻链接的手段，并非必须，本领域技术人员可以动用一切可以有助于提高比较的公知算法加以代替，恕不赘述。可以看出，通过对重复链接进行去重，所得到的链接便具有一定的唯一性网页指向，显然有助于提高本装置其它功能模块的执行效率。The sorting in the above two structural forms and the means of obtaining adjacent links are not necessary, and those skilled in the art can use all known algorithms that can help improve the comparison to replace them, and will not be described in detail. It can be seen that by removing duplicate links, the obtained links have a certain unique web page pointing, which obviously helps to improve the execution efficiency of other functional modules of the device.

所述的添加模块123，用于确定查新单元12处理后的链接中的关联新链接，将该新链接添加至待扫描队列。The adding module 123 is configured to determine an associated new link among the links processed by the novelty checking unit 12, and add the new link to the queue to be scanned.

如前所述，确定新链接的过程，其实质上也是在确定该链接是否与目前已存在的已知特定网站存在关联关系，因此而确定属于已知特定网站的关联新链接，不仅包括已经记录到已知特定网站列表(待扫描队列)中的域名、IP地址或者更具体的链接等，还包括一些其域名未出现在该列表中、而其映射的IP地址却已经被记录到该列表中或者落入该列表中已记录的IP地址所构成的IP地址段或IP地址段区间的链接。因此，在本添加模块123中确定关联新链接，也即对上述揭示的多种设定单元120实例进行灵活运用(调用)的过程。显然，容易理解，运用设定单元120的上述三种结构实例是灵活的，可以仅选其中一种，也可同时选择任意多种。其中第一种，通过手动登记的方式，适于从中登记一个网站域名，此后所有该域名之下的未曾扫描的具体链接(如前所述可以通过在链接库中或者待扫描队列中标识状态来识别)，均视为该网站的新链接；其中的第二种，利用域名注册特征信息来登记，无论是通过人为查询还是程序实现，均能起到如第一种同理的效果，但其中在程序中实现的方式是本添加模块123可采用的关键，能够藉此提高程序的智能化和自动化程度；其中的第三种，通过比较请求包链接所指向的IP地址是否落入目前存在的已知特定网站列表中的链接所指向的IP地址或由其构成的连续IP地址段范围，来决定是否将该请求包链接视为属于已知特定网站的关联新链接，这种方式能够自动扩展已知特定网站列表，如果已知特定网站是单列一个列表，那么，可以将该新链接的域名添加到该列表中，而将该新链接添加到链接库(如有)和待扫描队列中；如果已知特定网站列表即同时用做待扫描队列，那么，直接添加该新链接到已知特定网站列表也便是将该新链接添加到待扫描队列的过程。As mentioned earlier, the process of determining a new link is essentially determining whether the link is associated with a known specific website that already exists. Therefore, determining a new link that belongs to a known specific website includes not only the Domain names, IP addresses or more specific links to the list of known specific websites (queue to be scanned), including some whose domain names do not appear in the list, but their mapped IP addresses have been recorded in the list Or the links that fall into the IP address segment or IP address segment range formed by the recorded IP addresses in the list. Therefore, determining the associated new link in the adding module 123 is a process of flexibly using (calling) the various setting unit 120 instances disclosed above. Obviously, it is easy to understand that the use of the above three structural examples of the setting unit 120 is flexible, and only one of them may be selected, or any number of them may be selected at the same time. The first one, through manual registration, is suitable for registering a website domain name, and then all specific links under the domain name that have not been scanned (as mentioned above can be identified by identifying the status in the link library or in the queue to be scanned) identification) are regarded as new links to the website; the second one is to use domain name registration characteristic information to register, whether it is through human query or program realization, it can have the same effect as the first one, but the The way to implement in the program is the key that this add module 123 can adopt, which can improve the intelligence and automation of the program; the third one is to compare whether the IP address pointed to by the link of the request packet falls into the currently existing The IP address pointed to by the link in the known specific website list or the range of continuous IP address segments formed by it is used to determine whether the request packet link is regarded as an associated new link belonging to the known specific website. This method can automatically expand Known specific website list, if the known specific website is a single list, then the domain name of the new link can be added to the list, and the new link can be added to the link library (if any) and the queue to be scanned; If the list of known specific websites is simultaneously used as the queue to be scanned, then directly adding the new link to the list of known specific websites is the process of adding the new link to the queue to be scanned.

借助上述揭示的几种设定单元120实例对本发明的有效链接进行了有关新链接的上述过程的筛选之后，所得到便是所有新链接(必要时可以在这些新链接的基础上，利用爬虫技术，将其视为种子URL进展新链接扩展)，为了便于本发明其他功能模块的执行，将该些新链接添加到如前所述的待扫描队列中。不管该待扫描队列是否与已知特定网站列表共用一表，还是进一步与所述链接库共用一表，或者待扫描队列是单独的一表，等等，如前所述，本领域技术人员均能够利用寻常知识在该待扫描队列中登记所有确定的新链接，并在后续仅仅对该些新链接实施漏洞扫描。After the effective links of the present invention have been screened in the above-mentioned process about new links by means of several setting unit 120 examples disclosed above, all new links are obtained (if necessary, crawler technology can be used on the basis of these new links) , regard it as the seed URL progressing new link expansion), in order to facilitate the execution of other functional modules of the present invention, these new links are added to the queue to be scanned as described above. Regardless of whether the queue to be scanned shares a table with the list of known specific websites, or further shares a table with the link library, or the queue to be scanned is a separate table, etc., as mentioned above, those skilled in the art will Common knowledge can be used to register all certain new links in the queue to be scanned, and to perform vulnerability scanning only on these new links in the future.

所述的检测单元13，用于对所述新链接相对应的网页实施漏洞扫描检测。The detection unit 13 is configured to perform vulnerability scanning detection on the webpage corresponding to the new link.

在经过上述步骤灵活的多种变例进行处理，最终从所有请求包链接中确定了所有新链接之后，可以利用检测单元13集中对这些新链接相对应的网页实施漏洞扫描检测。当然，所谓的集中，在时间上可以一般是周期性的。因为用户请求不断发生，本装置能不断获取请求包，并可不断对请求包进行分析，但不可能等到用户不再发送请求时才开始进行扫描检测。因此，本检测单元13与其它功能模块只有连接关系，不应以这种连接关系排除其在时间上的穿插关系。例如，可以一边确定新链接，一边对之前已确定的新链接进行扫描。可以以一个进程不断确定接收请求包并确定新链接，将新链接存入待扫描队列，而另一进程则不断地对待扫描队列中的新链接实施扫描。不管其它功能模块如何灵活变通实现，本检测单元13仅需关注所述待扫描队列中的新链接，同理，无论本检测单元13如何灵活变通实现，前述各功能模块最终提供的接口也在于一个存储有新链接的待扫描队列，待扫描队列无疑成为本检测单元13与之前的功能模块之间的接口，本领域技术人员应当知晓此一原理。After processing through multiple flexible variants of the above steps, and finally determining all new links from all request packet links, the detection unit 13 can be used to centrally implement vulnerability scanning detection on the web pages corresponding to these new links. Of course, the so-called concentration can generally be periodic in time. Because user requests are constantly occurring, the device can continuously obtain and analyze request packets, but it is impossible to start scanning and detection until the user no longer sends requests. Therefore, the detection unit 13 has only a connection relationship with other functional modules, and this connection relationship should not exclude its interspersed relationship in time. For example, a previously determined new link may be scanned while determining a new link. One process can continuously determine the reception of the request packet and determine the new link, and store the new link in the queue to be scanned, while the other process continuously scans the new link in the queue to be scanned. No matter how flexibly and flexibly other functional modules are implemented, the detection unit 13 only needs to pay attention to the new links in the queue to be scanned. The to-be-scanned queue storing the new link undoubtedly becomes the interface between the detection unit 13 and the previous functional modules, and those skilled in the art should know this principle.

本发明所称新链接相对应的网页中的对应关系，既可以是指由新链接利用域名与IP地址的关系直接映射到网站服务器中相应的网页的关系，也可以是指将该相应网页下载后存储于本地网页库中的这种间接的一一对应关系。因此，适应这两种具体的对应关系，可以为本装置的检测单元13提供两种结构实例，通过以下任意一种结构均可对本发明确定的新链接所指向的网页进行漏洞扫描检测。The corresponding relationship in the webpage corresponding to the new link mentioned in the present invention can refer to the relationship that the new link is directly mapped to the corresponding webpage in the website server by using the relationship between the domain name and the IP address, or it can refer to the downloading of the corresponding webpage. This indirect one-to-one correspondence is stored in the local webpage database later. Therefore, adapting to these two specific correspondences, two structural examples can be provided for the detection unit 13 of the device, and any of the following structures can be used to perform vulnerability scanning detection on the webpage pointed to by the new link determined in the present invention.

结构实例一、由一获取单元从所述待扫描队列中获取记载于其中的新链接，然后，利用该新链接直接映射的在线网页，通过向其网站服务器发送请求，利用网站服务器返回的网页，借助一实施单元进行漏洞扫描检测。这种方式会加大新链接所在服务器的负担和处理时间，但可以适当节省利用实现本装置的软件的运算量。Structural Example 1. An acquisition unit acquires the new link recorded therein from the queue to be scanned, and then uses the online webpage directly mapped by the new link to send a request to its web server, and utilizes the web page returned by the web server to Vulnerability scanning and detection are performed by means of an implementation unit. This method will increase the burden and processing time of the server where the new link is located, but it can appropriately save the amount of computation using the software that implements the device.

结构实例二、由一获取单元从待扫描队列中获取新链接之后，由一下载单元利用所述新链接去下载这些新链接直接映射的网页，下载方法可以同结构实例一，将这些网页添加至一个本地网页库中，然后借助一实施单元对这些本地网页库中的各个网页实施漏洞扫描检测。或者也可如前所述，开设两个进程，一个用于不断下载各新链接所映射的在线网页至本地网页库，另一个则不断地对刚下载的本地网页库中的网页实施漏洞扫描检测。Structural Example 2: After obtaining new links from the queue to be scanned by an acquisition unit, a download unit uses the new links to download the webpages directly mapped to these new links. The download method can be the same as that of structural example 1, adding these webpages to In a local webpage library, and then implement vulnerability scanning detection on each webpage in these local webpage libraries by means of an implementation unit. Or, as mentioned above, two processes can be set up, one is used to continuously download the online webpages mapped by each new link to the local webpage library, and the other continuously scans and detects the vulnerabilities of the newly downloaded webpages in the local webpage library .

通过上述各步骤，本发明的装置便可完成对网站进行安全检测的任务，将漏洞扫描后的结果存储于相应的文件或数据库中，可供它用。进一步，为了取得更佳的人机交互效果，本发明还可以可选地包括显示单元14：Through the above-mentioned steps, the device of the present invention can complete the task of performing security detection on the website, and store the results of vulnerability scanning in corresponding files or databases for other use. Further, in order to achieve a better human-computer interaction effect, the present invention may also optionally include a display unit 14:

所述的显示单元14，用于显示图形用户界面以输出实施漏洞扫描检测的结果信息。The display unit 14 is configured to display a graphical user interface to output the result information of vulnerability scanning and detection.

该显示单元14被配置为用于提供一个图形用户界面，在检测单元13完成漏洞扫描检测之后，对检测结果进行分析、统计，将进行数学处理后的结果信息输出到该图形用户界面中，可以使网管员一目了然，从而便于网管员修补网页漏洞。The display unit 14 is configured to provide a graphical user interface. After the detection unit 13 completes the vulnerability scanning detection, the detection results are analyzed and counted, and the result information after the mathematical processing is output to the graphical user interface, which can Make it clear to the network administrator at a glance, so that it is convenient for the network administrator to patch the loopholes in the webpage.

综上所述，本发明可以及时发现已知特定网站及其新链接，并且可以实时对这些新链接实施漏洞检测，避免漏检测，并且能避免对无效链接和重复链接进行多余的检测，具有高效和及时维护网站安全的优点。To sum up, the present invention can discover known specific websites and their new links in time, and can implement loophole detection on these new links in real time, avoid missing detection, and can avoid unnecessary detection of invalid links and duplicate links, with high efficiency And the advantages of timely maintenance of website security.

本发明的实施例公开了：Embodiments of the invention disclose:

A1.一种网站安全检测方法，其特征在于，包括以下步骤：A1. A website security detection method, characterized in that, comprising the following steps:

A2.根据权利要求A1所述的网站安全检测方法，其特征在于，所述采集数据的来源IP地址为该请求包的目的IP地址。A2. The website security detection method according to claim A1, wherein the source IP address of the collected data is the destination IP address of the request packet.

3.根据权利要求A2所述的网络安全检测方法，其特征在于，所述采集数据来源于一个安装于所述来源IP地址的设备的采集模块。3. The network security detection method according to claim A2, wherein the collected data comes from a collection module installed on the device at the source IP address.

A4.根据权利要求A1所述的网站安全检测方法，其特在于，所述采集数据的来源IP地址为该请求包中的来源IP地址。A4. The website security detection method according to claim A1, wherein the source IP address of the collected data is the source IP address in the request packet.

A5.根据权利要求A4所述的网站安全检测方法，其特征在于，所述采集数据来源于一个安装于所述来源IP地址的设备的浏览器插件。A5. The website security detection method according to claim A4, wherein the collected data originates from a browser plug-in installed on the device of the source IP address.

A6.根据权利要求A1所述的网站安全检测方法，其特征在于，确定属于已知特定网站的关联新链接之前，汇总所述请求包所包含的链接并去除其中的重复链接。A6. The website security detection method according to claim A1, characterized in that before determining the associated new link belonging to a known specific website, summarizing the links contained in the request package and removing duplicate links therein.

A7.根据权利要求A6所述的网站安全检测方法，其特征在于，所述去除重复链接的步骤包括如下细分步骤：A7. The website security detection method according to claim A6, wherein the step of removing duplicate links comprises the following subdivision steps:

A8.根据权利要求A6所述的网站安全检测方法，其特征在于，所述去除重复链接的步骤包括如下细分步骤：A8. The website security detection method according to claim A6, wherein the step of removing duplicate links comprises the following subdivision steps:

A9.根据权利要求A1所述的网站安全检测方法，其特征在于，所述已知特定网站和/或其新链接通过图形用户界面接收用户设定而预先给定。A9. The website security detection method according to claim A1, characterized in that, the known specific website and/or its new link is preset by receiving user settings through a graphical user interface.

A10.根据权利要求A8所述的网站安全检测方法，其特征在于，所述图形用户界面所接收的设定的内容包括指向网站的域名或IP地址。A10. The website security detection method according to claim A8, wherein the set content received by the graphical user interface includes a domain name or an IP address pointing to the website.

A11.根据权利要求A1所述的网站安全检测方法，其特征在于，通过确定请求包中的链接所指向的IP地址属于所述已知特定网站所指向的IP地址或其所属IP地址段而将该链接确定为属于已知特定网站的关联新链接。A11. The website security detection method according to claim A1, characterized in that, by determining that the IP address pointed to by the link in the request packet belongs to the IP address pointed to by the known specific website or the IP address segment to which it belongs The link is determined to be a new link associated with a known specific website.

A12.根据权利要求A1所述的网站安全检测方法，其特征在于，通过比较所述请求包中的链接的域名的注册特征信息与已知特定网站的域名的注册特征信息相同而将该链接确定为属于已知特定网站的关联新链接。A12. The website security detection method according to claim A1, characterized in that the link is determined by comparing the registration feature information of the linked domain name in the request packet with the registration feature information of the domain name of a known specific website A new link for an association belonging to a known specific website.

A13.根据权利要求A1所述的网站安全检测方法，其特征在于，设有已知特定网站列表用于记录一个或多个所述的已知特定网站的域名和/或其相应的IP地址。A13. The website security detection method according to claim A1, wherein a known specific website list is provided for recording domain names and/or corresponding IP addresses of one or more known specific websites.

A14.根据权利要求A1所述的网站安全检测方法，其特征在于，所述利用所述请求包所包含的链接确定属于已知特定网站的关联新链接的步骤，包括如下细分步骤：A14. The website security detection method according to claim A1, wherein the step of using the link contained in the request packet to determine the associated new link belonging to a known specific website includes the following subdivision steps:

A15.根据权利要求A1所述的网站安全检测方法，其特征在于，所述对所述新链接相对应的网页实施漏洞扫描的步骤，包括如下细分步骤：A15. The website security detection method according to claim A1, wherein the step of implementing vulnerability scanning to the webpage corresponding to the new link includes the following subdivision steps:

对所述新链接映射的网页实施漏洞扫描检测。Vulnerability scanning is performed on the webpage mapped to the new link.

A16.根据权利要求A1所述的网站安全检测方法，其特征在于，所述对所述新链接相对应的网页实施漏洞扫描的步骤，包括如下细分步骤：A16. The website security detection method according to claim A1, wherein the step of implementing vulnerability scanning to the webpage corresponding to the new link includes the following subdivision steps:

获取所述待扫描队列中的新链接所映射的网页并添加至本地网页库；Obtain the webpage mapped by the new link in the queue to be scanned and add it to the local webpage library;

A17.根据权利要求A1所述的网站安全检测方法，其特征在于，该方法包括后续步骤：显示图形用户界面以输出实施漏洞扫描检测的结果信息。A17. The website security detection method according to claim A1, characterized in that, the method comprises a subsequent step: displaying a graphical user interface to output the result information of vulnerability scanning detection.

B18.一种网站安全检测装置，其特征在于，包括：B18. A website security detection device, characterized in that, comprising:

B19.根据权利要求B18所述的网站安全检测方法，其特征在于，所述抓包单元所获取的采集数据的来源IP地址为该请求包的目的IP地址。B19. The website security detection method according to claim B18, wherein the source IP address of the collected data obtained by the packet capture unit is the destination IP address of the request packet.

B20.根据权利要求B19所述的网络安全检测方法，其特征在于，所述采集数据来源于一个安装于所述来源IP地址的设备的采集模块。B20. The network security detection method according to claim B19, wherein the collected data comes from a collection module installed on the device of the source IP address.

B21.根据权利要求B18所述的网站安全检测方法，其特在于，所述抓包单元采集数据的来源IP地址为该请求包中的来源IP地址。B21. The website security detection method according to claim B18, characterized in that the source IP address of the data collected by the packet capture unit is the source IP address in the request packet.

B22.根据权利要求B21所述的网站安全检测方法，其特征在于，所述采集数据来源于一个安装于所述来源IP地址的设备的浏览器插件。B22. The website security detection method according to claim B21, wherein the collected data originates from a browser plug-in installed on the device of the source IP address.

B23.根据权利要求B18所述的网站安全检测装置，其特征在于，所述查新单元，被配置为在确定属于已知特定网站的关联新链接之前，汇总所述请求包所包含的链接并去除其中的重复链接。B23. The website security detection device according to claim B18, wherein the novelty checking unit is configured to summarize the links contained in the request package and Remove duplicate links from it.

B24.根据权利要求B23所述的网站安全检测装置，其特征在于，所述查新单元包括：B24. The website safety detection device according to claim B23, wherein the novelty checking unit includes:

B25.根据权利要求B23所述的网站安全检测装置，其特征在于，所述查新单元包括：B25. The website security detection device according to claim B23, wherein the novelty checking unit includes:

B26.根据权利要求B18所述的网站安全检测装置，其特征在于，该装置还包括设定单元，用于显示图形用户界面以接收用户设定，由此而预先给定所述已知特定网站和/或其新链接。B26. The website safety detection device according to claim B18, characterized in that the device also includes a setting unit for displaying a graphical user interface to receive user settings, thereby presetting the known specific website and/or its new links.

B27.根据权利要求B26所述的网站安全检测装置，其特征在于，所述图形用户界面所接收的设定的内容包括指向网站的域名或IP地址。B27. The website security detection device according to claim B26, wherein the set content received by the graphical user interface includes a domain name or an IP address pointing to a website.

B28.根据权利要求B18所述的网站安全检测装置，其特征在于，该装置还包括设定单元，被配置为通过确定请求包中的链接所指向的IP地址属于所述已知特定网站所指向的IP地址或其所属IP地址段而将该链接确定为属于已知特定网站的关联新链接。B28. The website security detection device according to claim B18, characterized in that the device also includes a setting unit configured to determine that the IP address pointed to by the link in the request packet belongs to the known specific website pointed to The IP address or the IP address segment it belongs to determines the link as an associated new link belonging to a known specific website.

B29.根据权利要求B18所述的网站安全检测装置，其特征在于，该装置还包括设定单元，被配置为通过比较所述请求包中的链接的域名的注册特征信息与已知特定网站的域名的注册特征信息相同而将该链接确定为属于所述已知特定网站的关联新链接。B29. The website security detection device according to claim B18, characterized in that the device also includes a setting unit configured to compare the registration feature information of the linked domain name in the request packet with the registration feature information of a known specific website The registered characteristic information of the domain name is the same, and the link is determined as a new associated link belonging to the known specific website.

B30.根据权利要求B18所述的网站安全检测装置，其特征在于，该装置还包括已知特定网站列表，用于记录一个或多个所述的已知特定网站的域名和/或其相应的IP地址。B30. The website security detection device according to claim B18, characterized in that, the device also includes a list of known specific websites for recording the domain names of one or more described known specific websites and/or their corresponding IP address.

B31.根据权利要求B18所述的网站安全检测装置，其特征在于，所述查新单元包括：B31. The website security detection device according to claim B18, characterized in that, the novelty checking unit comprises:

B32.根据权利要求B18所述的网站安全检测装置，其特征在于，所述检测单元包括：B32. The website security detection device according to claim B18, wherein said detection unit comprises:

B33.根据权利要求B18所述的网站安全检测装置，其特征在于，所述检测单元包括：B33. The website security detection device according to claim B18, wherein said detection unit comprises:

B34.根据权利要求B18所述的网站安全检测装置，其特征在于，该装置包括显示单元，用于显示图形用户界面以输出实施漏洞扫描检测的结果信息。B34. The website security detection device according to claim B18, characterized in that the device comprises a display unit for displaying a graphical user interface to output the result information of implementing vulnerability scanning detection.

应当注意，在此提供的算法和公式不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示例一起使用。根据上面的描述，构造这类系统所要求的结构是显而易见的。此外，本发明也不针对任何特定编程语言。应当明白，可以利用各种编程语言实现在此描述的本发明的内容，并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。It should be noted that the algorithms and formulas presented herein are not inherently related to any particular computer, virtual system, or other device. Various general systems can also be used with the examples based here. The structure required to construct such a system is apparent from the above description. Furthermore, the present invention is not specific to any particular programming language. It should be understood that various programming languages can be used to implement the content of the present invention described herein, and the above description of specific languages is for disclosing the best mode of the present invention.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

类似地，应当理解，为了精简本发明并帮助理解本发明各个方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法和装置解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如权利要求书所反映，发明方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, in order to streamline the present invention and to facilitate an understanding of one or more of its various aspects, various features of the invention are sometimes grouped together into a single embodiment , figure, or description of it. This disclosed method and apparatus, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。。Furthermore, those skilled in the art will understand that although some embodiments described herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. and form different embodiments. .

本发明的各个部件实施例可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本领域的技术人员应当理解，可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的网站安全检测设备中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如，计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上，或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到，或者在载体信号上提供，或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) can be used in practice to implement some or all functions of some or all components in the website security detection device according to the embodiment of the present invention. The present invention can also be implemented as an apparatus or an apparatus program (for example, a computer program and a computer program product) for performing a part or all of the methods described herein. Such a program for realizing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.

以上所述仅是本发明的部分实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above descriptions are only part of the embodiments of the present invention. It should be pointed out that those skilled in the art can make some improvements and modifications without departing from the principles of the present invention. It should be regarded as the protection scope of the present invention.

Claims

1. A website security detection method, is characterized in that, comprises the following steps:

Receive the collected data including the hypertext transfer protocol request packet through the remote port;

Utilize the link contained in the request packet to determine the associated new link belonging to the known specific website, the known specific website refers to one or more known websites, the links of which are all interpreted to a specific IP address segment;

Implement vulnerability scanning and detection on the webpage corresponding to the new link.

2. The website security detection method according to claim 1, wherein the source IP address of the collected data is the destination IP address of the request packet.

3. The network security detection method according to claim 2, wherein the collected data comes from a collection module installed on the device at the source IP address.

4. The website security detection method according to claim 1, wherein the source IP address of the collected data is the source IP address in the request packet.

5. The website security detection method according to claim 4, wherein the collected data originates from a browser plug-in installed on the device at the source IP address.

6 . The website security detection method according to claim 1 , wherein before determining the associated new link belonging to a known specific website, the links included in the request packet are summarized and duplicate links are removed. 7 .

7. The website security detection method according to claim 6, wherein the step of removing duplicate links comprises the following subdivision steps:

Multiple links formed by accessing the database that differ only in their variables are identified as duplicate links;

Only keep one of the duplicate links to remove duplicate links.

8. The website security detection method according to claim 6, wherein the step of removing duplicate links comprises the following subdivision steps:

Identify multiple links with the same signature as duplicate links;

Only keep one of the duplicate links to remove duplicate links.

9. The website security detection method according to claim 1, characterized in that, the known specific website and/or its new link is preset by receiving user settings through a graphical user interface.

10 . The website security detection method according to claim 8 , wherein the set content received by the graphical user interface includes a domain name or an IP address pointing to the website. 11 .

11. The website security detection method according to claim 1, characterized in that, by determining that the IP address pointed to by the link in the request packet belongs to the IP address pointed to by the known specific website or the IP address segment to which it belongs The link is determined to be a new link associated with a known specific website.

12. The website security detection method according to claim 1, wherein the link is determined by comparing the registration characteristic information of the domain name of the link in the request packet with the registration characteristic information of the domain name of a known specific website. A new link for an association belonging to a known specific website.

13. The website security detection method according to claim 1, characterized in that a list of known specific websites is provided for recording domain names and/or corresponding IP addresses of one or more said known specific websites.

14. The website security detection method according to claim 1, wherein the step of using the links contained in the request packet to determine the associated new links belonging to a known specific website comprises the following subdivision steps:

Extract links to all request packages that have been fetched;

Remove duplicate links pointing to webpages with the same code from the extracted links, and keep only one of the duplicate links for duplicate links;

Identify new links among them and add the new links to the queue to be scanned.

15. The website security detection method according to claim 1, wherein the step of implementing vulnerability scanning on the web page corresponding to the new link includes the following subdivision steps:

acquiring the new link from a queue to be scanned for recording the new link;

Vulnerability scanning is performed on the webpage mapped to the new link.

16. The website security detection method according to claim 1, wherein the step of implementing vulnerability scanning on the webpage corresponding to the new link comprises the following subdivision steps:

acquiring the new link from a queue to be scanned for recording the new link;

Obtain the webpage mapped by the new link in the queue to be scanned and add it to the local webpage library;

Implement vulnerability scanning and detection on the webpages in the webpage library downloaded according to the new link.

17. The website security detection method according to claim 1, characterized in that the method comprises a subsequent step: displaying a graphical user interface to output the result information of vulnerability scanning detection.

18. A website security detection device, characterized in that it comprises:

A packet capture unit is used to receive the collection data comprising a hypertext transfer protocol request packet through a remote port;

The novelty checking unit is adapted to use the link contained in the request packet to determine the associated new link belonging to the known specific website, and the known specific website refers to one or more known websites, the links of which are all interpreted to a specific IP on the address segment;

The detection unit is configured to perform vulnerability scanning detection on the webpage corresponding to the new link.

19. The website security detection method according to claim 18, wherein the source IP address of the collected data obtained by the packet capture unit is the destination IP address of the request packet.

20. The network security detection method according to claim 19, wherein the collected data comes from a collection module installed on the device at the source IP address.

21. The website security detection method according to claim 18, characterized in that the source IP address of the data collected by the packet capture unit is the source IP address in the request packet.

22. The website security detection method according to claim 21, wherein the collected data comes from a browser plug-in installed on the device at the source IP address.

23. The website security detection device according to claim 18, wherein the novelty checking unit is configured to summarize the links contained in the request packet and Remove duplicate links in it.

24. The website security detection device according to claim 23, wherein the novelty checking unit comprises:

A duplicate checking submodule is used to determine multiple links that are only different in variables formed by accessing the database as repeated links;

Removing sub-modules is suitable for implementing only one of the repeated links is retained to achieve the removal of repeated links.

25. The website security detection device according to claim 23, wherein the novelty checking unit comprises:

Duplicate checking sub-module, used to determine multiple links with the same signature as duplicate links;

26. The website safety detection device according to claim 18, characterized in that the device further comprises a setting unit for displaying a graphical user interface to receive user settings, thereby presetting the known specific website and/or its new links.

27. The website security detection device according to claim 26, characterized in that the set content received by the graphical user interface includes a domain name or an IP address pointing to a website.

28. The website security detection device according to claim 18, characterized in that the device further comprises a setting unit configured to determine that the IP address pointed to by the link in the request packet belongs to the known specific website pointed to The IP address or the IP address segment it belongs to determines the link as an associated new link belonging to a known specific website.

29. The website security detection device according to claim 18, characterized in that the device further comprises a setting unit configured to compare the registration characteristic information of the linked domain name in the request packet with the registration feature information of the known specific website The registered characteristic information of the domain name is the same, and the link is determined as a new associated link belonging to the known specific website.

30. The website security detection device according to claim 18, characterized in that, the device also includes a list of known specific websites for recording the domain names of one or more of the known specific websites and/or their corresponding IP address.

31. The website security detection device according to claim 18, wherein the novelty checking unit comprises:

Extraction module, used to extract the links of all the request packages that have been obtained;

The de-duplication module is used to remove duplicate links pointing to webpages with the same code among the links extracted by the extraction module, and only keep one of the duplicate links for duplicate links;

The adding module is configured to determine a new link therein, and add the new link to a queue to be scanned.

32. The website security detection device according to claim 18, wherein the detection unit comprises:

an acquiring unit configured to acquire the new link from a queue to be scanned for recording the new link;

The implementation unit is configured to implement vulnerability scanning and detection on the webpage mapped to the new link.

33. The website security detection device according to claim 18, wherein the detection unit comprises:

A downloading unit, configured to download the webpage mapped by the new link in the queue to be scanned and add it to the local webpage library;

The implementation unit is used for implementing vulnerability scanning and detection on the webpages in the webpage library downloaded according to the new link.

34. The website security detection device according to claim 18, characterized in that the device comprises a display unit for displaying a graphical user interface for outputting result information of vulnerability scanning detection.