HK1184922B

HK1184922B - Method and device for website security detection

Info

Publication number: HK1184922B
Application number: HK13112073.0A
Authority: HK
Inventors: 吴翰清; 刘志生
Original assignee: 阿里巴巴集团控股有限公司
Filing date: 2013-10-28
Publication date: 2017-12-15

Description

Website security detection method and device

Technical Field

The present application relates to the field of computer network security, and in particular, to a method and an apparatus for detecting website security.

Background

With the increasing rampant of crime and hacking on the internet, the network security problem has become an important challenge for people in the information age. As the B/S schema is widely used, more and more programmers write Web applications in this schema. However, due to the fact that the level and experience of developers are different, when a considerable part of developers write codes, necessary legal judgment is not performed on input data of users or information (such as cookies) carried in pages, so that attackers can utilize the programming vulnerability to invade databases or attack users of Web applications, and therefore important data and benefits are obtained, and therefore every internet enterprise almost faces the problem of security threat brought by website vulnerabilities. Aiming at the problem of security threat brought by website loopholes, the traditional website loophole detection method is a crawler-based loophole detection method, and the method comprises the steps of crawling website pages to collect url, filling various malformed data which possibly cause security threat in the url, detecting the website loopholes, and finally returning a detection result.

Although the vulnerability detection method based on the crawler can realize website vulnerability detection to a certain extent, the crawler itself can initiate a large number of requests to the website to capture pages to realize link discovery, and the links do not always have the security vulnerability problem, so that network resources and time cost are wasted. In addition, the crawler is characterized in that links are found by capturing pages, and if the pages do not have the links linked to the vulnerability pages, vulnerabilities cannot be found, so that security vulnerabilities existing in websites cannot be detected.

Disclosure of Invention

The application provides a website security detection method and device, which are used for solving the security threat problem caused by website vulnerabilities and providing website security scanning service for Internet enterprises.

One aspect of the present application provides a website security detection method, including: obtaining a website; identifying a website program used by the website based on the website fingerprint information of the website; detecting the vulnerability of the website according to the website program; and returning a detection result.

Preferably, the obtaining a website specifically includes: and obtaining the website according to the website domain name information or the IP address of the website.

Preferably, before the obtaining of a website, characteristics of various website programs are collected to establish a fingerprint information base, and the website fingerprint information is obtained from the fingerprint information base.

Preferably, after the obtaining a website, the method further comprises: judging whether the website is detected; if the detection result is detected, the previous detection result is returned.

Preferably, the identifying, based on the website fingerprint information of the website, the website program used by the website specifically includes: sending a request for grabbing the website; obtaining return data based on the request; and identifying the website program used by the website based on the return data.

Preferably, the return data includes a return status code indicating a status that the request is processed and a website fingerprint key of the website.

Preferably, the identifying, based on the returned data, the website program used by the website specifically includes: analyzing the return status code; when the return status code represents that the website or the page on the website exists, analyzing the website fingerprint keywords; and determining the website program used by the website according to the website fingerprint keywords.

Preferably, the detecting the vulnerability of the website according to the website program specifically comprises: loading all vulnerability information of the website program; and detecting based on the vulnerability information.

Preferably, the detection result is saved to a database.

The present application further provides a website security detection apparatus, the apparatus includes: an obtaining module for obtaining a website; the identification module is used for identifying the website program used by the website according to the website fingerprint information of the website; the detection module is used for detecting the vulnerability of the website according to the website program; and the return module returns the detection result.

The beneficial effect of this application is as follows:

in the embodiment of the application, the website program used by the website is identified by analyzing the website fingerprint information of the website, and the website program is right for vulnerability detection, so that the cost of website detection can be reduced, the vulnerability detection speed and coverage range are improved, all vulnerabilities of the website program can be comprehensively covered for the identified website program, zero false positive is achieved, and the false positive rate is reduced.

Further, in an embodiment of the present application, before the detection, it is further determined whether the website has been detected, and if the website has been detected, the previous detection result is directly returned, so that the detection time, cost and network resources are greatly saved, and the user can also obtain the detection result quickly.

Furthermore, in an embodiment of the present application, return data of a request for capturing a page, for example, a return status code and a keyword, is analyzed to obtain a website program used by a website.

Drawings

FIG. 1 is a flowchart illustrating a website security detection method according to an embodiment of the present application;

fig. 2 is a functional block diagram of a website security detection apparatus according to an embodiment of the present application.

Detailed Description

In order to enable those skilled in the art to understand the present application in more detail, the present application is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, fig. 1 is a flowchart of a website security detection method in an embodiment of the present application, where the method in the embodiment includes:

step 110: obtaining a website;

step 112: identifying a website program used by a website based on website fingerprint information of the website;

step 114: detecting the vulnerability of the website according to a website program; and

step 116: and returning a detection result.

In step 110, a website is obtained, for example, the website may be obtained through website domain name information or an IP address of the website, and for example, the user inputs a website domain name through an input device. The website domain name is the name of a certain computer or computer group on the Internet, which is composed of a string of names separated by points, and is used for identifying the electronic position (sometimes also referred to as geographical position) of the computer during data transmission, and is an important identification of a network unit and an individual on the network, and plays a role in identification, so that other people can conveniently identify and retrieve information resources of a certain enterprise, organization or individual, thereby better realizing resource sharing on the network. The website domain name and the website are in one-to-one correspondence, and the website can be obtained when the website domain name information of the website is received. For example, if the website to be security checked is a hundred-degree website, the domain name www.baidu.com is input, and then the website of the hundred-degree network can be located according to the domain name.

In another embodiment, after step 110 and before step 112, it is determined whether the website has been detected, and if the website has been detected, the previous detection result is returned directly, so that the detected website does not need to be detected again, thereby effectively saving time and resources.

If the website is not detected, step 112 is executed to identify the website program used by the website based on the website fingerprint information of the website. In this embodiment, the website fingerprint information refers to page features, directory structures, and the like unique to the website program, and by determining these features, it is possible to identify what website program is used by the website. For example, the website fingerprint information is obtained from a fingerprint information base, as shown in fig. 2, the fingerprint information base 213 collects characteristics of various website programs, and the fingerprint information base 213 may be established before step 110, or may be established after step 110 and before step 112, which is not limited in the present application.

And if the website is not detected, sending a request for capturing the website, acquiring return data based on the request, and identifying the website program used by the website based on the return data. The sent request is, for example, an http request, the returned data includes, for example, a returned status code and a keyword, the returned status code represents a status of the server processing the request, the returned status code is represented by a 3-bit number, different returned status codes represent different statuses of the server processing the request for crawling the website, for example, when the returned status code is 200, it can be known that the server has successfully processed the request, a response header or a data body desired by the request will be returned with the response, for example, when the returned status code is 100, a first part of the request received by the server can be obtained, and now, waiting for receiving the rest part, the client should continue sending the request.

The keywords are website fingerprint keywords, and the page features and the directory structure are unique features of website programs, so that the website is indicated as a microblog-type website according to the page features and the keywords of the directory structure, for example, the keywords are 'Xinlang' and 'microblog', the website is indicated as a mailbox-type website if the keywords are 'internet surfing' and 'mailbox', and the website is indicated as a forum-type website if the keywords are 'sky carer' and 'forum', so that the page features and the directory structure can be known, and the website program used by the website can be identified according to the page features and the directory structure.

In some cases, the keywords are directly available from the website page, and in some cases, the keywords cannot be directly obtained from the website page, and at this time, it is necessary to view the source code of the website, such as a JS script or an html script, where the keywords can be obtained from the source code, such as an in-link address in the source code, and the in-link refers to a link between content pages under the same website domain name, such as a link between a channel, a column, and an end content page, and even a Tag link between key words in the website can be classified as an in-link. The internal link address comprises link path and file name characteristics, personalized characteristics exist in the link path and the file name of the internal link address, and the type of the website can be displayed. For example, in a page of a wordpress blogging program, the internal link address of the picture and style file may contain a keyword wp-content/the mes/or a keyword wp-inclusions/, and when the internal link address of a website is analyzed, if the keyword wp-content/the mes or the keyword wp-inclusions occurs, the website is described to use wordpress as a blog. In other embodiments, the keywords may also be obtained in other parts of the source code, and the application is not limited in this respect.

In the specific analysis process, the return status code is firstly analyzed, when the return status code indicates that the website or the page on the website exists, for example, the return status code is 200, the website fingerprint keyword is analyzed, and then the website program used by the website is determined according to the website fingerprint keyword.

In another embodiment, even if the returned status code is 200, according to different server settings, the status code 200 originally representing that the request processing is successful may indicate that the request has an error, so that an error page is returned, that is, the normal status code 404 represents, so that in order to further determine whether the website exists, the length of the returned data is further analyzed, for example, when two returned data names are consistent, the data length of the first returned data is 100 bytes, and the data length of the second returned data is 200 bytes, it can be determined that the returned data with the data length of 100 bytes indicates the error page, and the returned data with the data length of 200 bytes indicates that the website exists, and the website is successfully returned.

In another embodiment, more than one request is sent, so the logical relationship between these requests is recorded, so the return data also includes the logical relationship with other return data.

In step 114, vulnerability detection is performed on the website according to the website program. Different website programs correspond to different vulnerability information, and when the website programs of the website are determined, the vulnerability information corresponding to the website programs can be determined. And loading all vulnerability information of the website program, and detecting based on the vulnerability information. For example, the vulnerability information is obtained from a vulnerability information base, as shown in fig. 2, vulnerability information corresponding to various website programs is collected in the vulnerability information base 215, and after the website programs of the website are determined, the vulnerability information corresponding to the website programs can be determined accordingly. The vulnerability information base 215 may be established before step 110, or may be established after step 110 and before step 114, which is not limited in this application.

In step 116, the detection result is returned, for example, the detection result is displayed on a display unit of the client, or the detection result is output to another client. Furthermore, the detection result can be stored in the database, and when the website needs to be detected next time, the detection result can be directly extracted from the database and output.

An embodiment of the present invention further provides a functional block diagram of a website security detection apparatus, please refer to fig. 2, where the apparatus includes:

an obtaining module 210 for obtaining a website;

the identification module 212 is used for identifying website programs used by the website according to the website fingerprint information of the website;

the detection module 214 detects vulnerabilities of the website according to a website program;

and the storage module 216 returns the detection result.

How to implement the above units of the website security detection apparatus shown in fig. 2 becomes clear by reading the above-described operation process of the website security detection method according to the embodiment of the present application, and therefore, for the sake of brevity of the description, how to implement the functions of the above units is not described in detail herein.

Through the above embodiments in the present application, at least the following technical effects can be achieved:

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A website security detection method is characterized by comprising the following steps:

obtaining a website;

identifying the website program used by the website based on the website fingerprint information of the website, wherein the identification comprises the following steps: sending a request for grabbing the website; obtaining return data based on the request; analyzing the return state code, the length of the return data and the logic relationship with other return data in the return data; when the website or the page on the website exists, analyzing the website fingerprint keywords; determining a website program used by the website according to the website fingerprint keywords, wherein the return data comprises a return state code, a logic relation with other return data, a return data length and the website fingerprint keywords of the website; wherein the return status code is used to indicate the status of the request being processed;

detecting the vulnerability of the website according to the website program;

and returning a detection result.

2. The method of claim 1, wherein the obtaining a website is specifically: and obtaining the website according to the domain name information or the IP address of the website.

3. The method of claim 1, wherein before the obtaining a website, collecting characteristics of various website programs to establish a fingerprint information base, wherein the website fingerprint information is obtained from the fingerprint information base.

4. The method of claim 1, wherein after said obtaining a web site, said method further comprises:

judging whether the website is detected;

if the detection result is detected, the previous detection result is returned.

5. The method of claim 1, wherein the detecting vulnerabilities of the website according to the website program is specifically:

loading all vulnerability information of the website program;

and detecting based on the vulnerability information.

6. The method of claim 1, wherein the test results are saved to a database.

7. A website security detection apparatus, the apparatus comprising:

an obtaining module for obtaining a website;

the identification module identifies the website program used by the website according to the website fingerprint information of the website, and comprises: sending a request for grabbing the website; obtaining return data based on the request; analyzing the return state code, the length of the return data and the logic relationship with other return data in the return data; when the website or the page on the website exists, analyzing the website fingerprint keywords; determining a website program used by the website according to the website fingerprint keywords, wherein the return data comprises a return state code, a logic relation with other return data, a return data length and the website fingerprint keywords of the website; wherein the return status code is used to indicate the status of the request being processed;

the detection module is used for detecting the vulnerability of the website according to the website program;

and the return module returns the detection result.