CN110855612B

CN110855612B - Web backdoor path detection method

Info

Publication number: CN110855612B
Application number: CN201910966144.5A
Authority: CN
Inventors: 娄宇; 范渊
Original assignee: DBAPPSecurity Co Ltd
Current assignee: DBAPPSecurity Co Ltd
Priority date: 2019-10-12
Filing date: 2019-10-12
Publication date: 2022-03-18
Anticipated expiration: 2039-10-12
Also published as: CN110855612A

Abstract

The invention provides a web backdoor path detection method, which obtains a collection path collection $Webshell_Path and a file name collection $Webshell_Name; uses a web crawler to crawl along the website homepage to obtain the website directory tree $Web_Catalog, and the URL tree $Web_Url_Tree, and the website Root directory $Web_Root, custom error page $Error_Page; obtain URL set $Target_Url to be detected according to set path set $Webshell_Path, file name set $Webshell_Name, directory tree $Web_Catalog and URL tree $Web_Url_Tree; the present invention is implanted by hackers on the website After the WEB backdoor, a batch of possible WEB paths can be given to provide support for the subsequent analysis of the WEB backdoor. The present invention provides a method for detecting a WEB backdoor path, through which the suspicious path of a website can be detected, and provides support for further WEB backdoor detection.

Description

Web back door path detection method

Technical Field

The invention relates to a detection scheme of a WEB backdoor path, in particular to a detection method of the WEB backdoor path.

Background

A WEB backdoor: the WEB backdoor is an instruction execution environment in the form of WEB page files such as ASP, PHP, JSP, CGI, and the like, and may also be referred to as webshell. After a hacker invades a website, the ASP or PHP backdoor file is usually mixed with the normal WEB page file in the WEB directory of the website server, and then the hacker can use a browser to access the ASP or PHP backdoor to obtain a command execution environment, so as to achieve the purpose of controlling the website server.

Webpage crawler technology: the first step of active detection of the WEB backdoor is to identify a suspicious WEB path, and usually, the WEB page of a website is obtained to traverse the whole website directory and file by a WEB crawler in a suspicious manner to obtain a path tree of the website. However, since the WEB backdoor is embedded by hackers through WEB site bugs, the WEB backdoor is placed in a hidden position and generally cannot be crawled by a WEB crawler.

WEB backdoor collection: and collecting information such as file names, paths, script types and the like of published WEB backdoors.

WEB backdoor path combining: the WEB back door path combination is a main means for realizing WEB back door path detection. And combining the features of the WEB backdoor, such as the path, the file name, the script type, the common words and the like discovered according to the history with the path of the current website to generate a large number of paths.

The WEB backdoor is a command execution environment in the form of a WEB page file, and may also be referred to as a webshell. After a hacker invades a website, the backdoor file and the normal webpage file in the WEB directory of the website server are mixed together, and then the hacker can use a browser to access the WEB backdoor to obtain a command execution environment so as to achieve the purpose of controlling the website server. Therefore, for the user server, the WEB backdoor is a very dangerous existence, so that finding and deleting the WEB backdoor in time is very critical to guarantee the security of the server.

After a hacker invades a website, a backdoor file is placed in a hidden position, usually in a isolated chain, and is difficult to find by an administrator. Therefore, finding the WEB backdoor is difficult, and there are two general finding ways:

firstly, checking access records through a website;

and secondly, active scanning and identification.

Accordingly, there is a need for improvements in the art.

Disclosure of Invention

The invention aims to provide an efficient web backdoor path detection method.

In order to solve the technical problem, the invention provides a web backdoor path detection method, which comprises the following steps:

1) acquiring an aggregation Path set $ Webshell _ Path and a file Name set $ Webshell _ Name;

2) using a Web crawler to crawl along a website home Page to obtain a directory Tree $ Web _ Catalog, a URL Tree $ Web _ Url _ Tree, a website Root directory $ Web _ Root and a custom Error Page $ Error _ Page of the website;

3) acquiring a URL set $ Target _ Url to be detected according to a set Path set $ Webshell _ Path, a file Name set $ Webshell _ Name, a directory Tree $ Web _ Catalog and a URL Tree $ Web _ Url _ Tree;

and using Http to request to access the connection in the URL set $ Target _ Url to be detected to obtain the Suspicious URL set $ Suspicious _ Url.

As an improvement to the web backdoor path detection method of the present invention:

the step 1 is as follows: adding the Path of the URL of the WEB backdoor into a set Path set $ Webshell _ Path, and adding the file Name of the URL of the WEB backdoor into a file Name set $ Webshell _ Name.

As a further improvement to the web backdoor path detection method of the present invention:

in step 1: the common english and the common person names are added to the set of file names $ Webshell _ Name.

the step 3 comprises the following steps:

3.1), performing Cartesian multiplication on a directory tree $ Web _ Catalog and a file Name set $ Webshell _ Name, and adding a result into a URL set $ Target _ Url to be detected;

performing Cartesian multiplication on a website Root directory $ Web _ Root and a Path set $ Webshell _ Path, performing Cartesian multiplication on a result and a file Name set $ Webshell _ Name, and adding a final result into a URL set $ Target _ Url to be detected;

the URL set $ Target _ Url to be detected is subjected to de-repeated linking, and the union set of the URL set $ Target _ Url to be detected and the URL Tree $ Web _ Url _ Tree is subtracted from the URL set $ Target _ Url to be detected, so that the final URL set $ Target _ Url to be detected is obtained;

3.2), suspicious path identification phase,

and sequentially detecting links in the URL set $ Target _ Url set to be detected finally obtained in the step 3.1), using Http to request to access the links, judging that the response code is 200 and the link of the non-self-defined error page is a Suspicious link, and adding the Suspicious link to the Suspicious URL set $ Suspicious _ Url.

in step 3.2, the method for judging the custom error page comprises the following steps: calculating the content similarity of the user-defined error page and the access link page, and defining the error page for the access link page when the content similarity of the page is larger than a preset value; otherwise it is not a custom error page.

in step 3.2, similarity is judged using the simhash algorithm.

the self-defined Error Page $ Error _ Page is a response Page returned by the website to an absent Page and a malicious request respectively by acquiring the website self-defined Error Page by requesting a batch of absent pages;

the absence or malicious request is as follows:

1) website homepage address + random character string;

2) website homepage address + random character string + script environment;

3) website home page address + malicious request url.

The web backdoor path detection method has the technical advantages that:

after the website is implanted into the WEB backdoor by a hacker, a batch of available WEB paths can be provided, and support is provided for subsequent analysis of the WEB backdoor. The invention provides a method for detecting a WEB back door path, which can detect a suspicious path of a website and provide support for further detection of the WEB back door.

Drawings

The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.

FIG. 1 is a schematic flow chart of a web backdoor path detection method according to the present invention;

FIG. 2 is a schematic flow chart of a path to be detected;

fig. 3 is a flow chart illustrating suspicious path identification.

Detailed Description

The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto.

Embodiment 1, a web backdoor path detection method, as shown in fig. 1-3, includes three stages:

1. a knowledge collection stage:

a) collecting the existing WEB backdoor through a corresponding knowledge base to serve as the characteristics of a path and a file name;

b) collecting common English words as file name features;

c) and collecting the commonly used names as file name characteristics.

2. And a target website identification stage:

and crawling a directory tree and a website URL tree of the target website.

3. Suspicious path identification stage:

a) and combining the directory tree collected in the target website identification stage 2 according to the path and file name characteristics collected in the knowledge collection stage 1 to form the URL to be detected.

b) Removing the URL in the URL tree of the website from the URL set to be detected to be used as a path to be detected;

c) and accessing the path to be detected, and identifying the Http response code and the content, wherein the response code is 200, and the path is a path of a non-self-defined error page and is a suspicious path.

For the sake of accurate description, the following definitions are made:

a knowledge base:

path set $ Webshell _ Path

Filename set $ Webshell _ Name

Website information:

a directory tree: catalog tree $ Web _ Catalog

URL tree: URL Tree $ Web _ Url _ Tree

Website root directory: website Root directory $ Web _ Root

Self-defining an error page: custom Error Page $ Error _ Page

And (3) detecting a URL set: URL set $ Target _ Url to be detected

And (3) suspicious URL collection: suspicious URL set $ Suspicious _ Url

The method specifically comprises the following steps:

1) and a knowledge collection stage:

collecting existing WEB backdoors through a knowledge base, adding paths of the URLs of the WEB backdoors into a set Path set $ Webshell _ Path and adding file names of the URLs of the WEB backdoors into a file Name set $ Webshell _ Name according to the URLs of the WEB backdoors.

The common english and the common person names are added to the set of file names $ Webshell _ Name.

2) Target website information collection stage

The method comprises the steps of crawling along a website home Page by using a Web crawler, obtaining a directory Tree $ Web _ Catalog, a URL Tree $ Web _ Url _ Tree, a website Root directory $ Web _ Root and a custom Error Page $ Error _ Page of the website.

Custom Error Page $ Error _ Page is a web site custom Error Page obtained by requesting a batch of non-existing pages. Respectively detecting response pages returned by the website to the non-existent pages and the malicious requests.

The absence or malicious request is as follows:

1. website home page address + random character string

2. Website homepage address + random character string + script environment

3. Website home address + malicious request url (e.g., sql injection request)

The website scripting environment is as follows:

Php、jsp、asp、aspx

three examples are given separately, with a hundred degree example:

1、http://www.baidu.com/fsdhjfhsdcfbnsdfkj

2、http://www.baidu.com/44kd9sn39dj.php

3、http://www.baidu.com/？id＝1&1＝1

3) suspicious path identification stage

3.1) combining the paths to be detected, as shown in FIG. 2;

according to the steps 1 and 2, a resource Path set $ Webshell _ Path, a file Name set $ Webshell _ Name, a directory Tree $ Web _ Catalog and a URL Tree $ Web _ Url _ Tree exist.

And performing Cartesian multiplication on a directory tree $ Web _ Catalog and a file Name set $ Webshell _ Name, and adding a result into a set URL set $ Target _ Url to be detected.

And performing Cartesian multiplication on the website Root directory $ Web _ Root and the Path set $ Webshell _ Path, performing Cartesian multiplication on the result and the file Name set $ Webshell _ Name, and adding the final result into the URL set $ Target _ Url to be detected.

The URL set $ Target _ URL to be detected deduplicates links. And calculating the URL set $ Target _ Url to be detected minus the union of the URL set $ Target _ Url to be detected and the URL Tree $ Web _ Url _ Tree to obtain the final URL set $ Target _ Url to be detected.

3.2), suspicious path identification phase, as shown in FIG. 3

And sequentially detecting links in the URL set $ Target _ Url set to be detected finally obtained in the step 3.1), using Http to request to access the links, judging that the response code is 200 and the link of the non-self-defined error page is a Suspicious link, and adding the Suspicious link to the Suspicious URL set $ Suspicious _ Url. The Error Page judging method is that the Error Page is compared with a self-defined Error Page $ Error _ Page, if the Page contents are similar, the Error Page is a self-defined Error Page; otherwise it is not a custom error page.

The method for judging whether the page contents are similar is the existing simhash algorithm and comprises the following steps:

3.21) calculating the abstract value of the page by using a simhash algorithm;

3.22) calculating the Hamming distance between the two pages through the abstract value;

3.23), setting a threshold value according to experience, wherein the page length is more than 500, and the page similarity is judged if the Hamming distance is less than 3. The page length is <500, and the page is judged to be similar if the hamming distance is less than 10.

The suspect set of URLs $ Suspicious _ Url is output.

Web backdoor path exploration tool instance

Knowledge base

The path and file name detection library of the web backdoor detection is preset by the following method.

1.1, acquiring the appeared file path set and file name characteristics of the published web backdoor case through user experience and Internet search.

1.2 collecting common English words and adding the common English words into the file name set.

1.3 collecting the names of the commonly used persons to access the file name set.

And (3) counting the collection results in 1.1, 1.2 and 1.3, and obtaining a file path set after duplication removal as follows:

/user/other/

/data/th/b/

the set of filenames is as follows:

log.php

index.php

tools.php

detecting website http://192.168.5.1 website Web backdoor

2. Crawling http://192.168.5.1 directory tree using web crawler

By recursively accessing http://192.168.5.1 pages of a website and internal links in all the pages thereof in an http Get mode, a directory tree of the website can be identified as follows

http://192.168.5.1/

http://192.168.5.1/a/

http://192.168.5.1/b/

http://192.168.5.1/b/c/

2.1 suspicious Path identification phase

2.1.1 Path combination phase

Using the path and file in the knowledge base in combination with the directory of the target web site (using Cartesian product), a set of instrumented files is generated

The file set is as follows:

http://192.168.5.1/log.php

http://192.168.5.1/index.php

http://192.168.5.1/tools.php

http://192.168.5.1/user/other/log.php

http://192.168.5.1/user/other/index.php

http://192.168.5.1/user/other/tools.php

http://192.168.5.1/data/th/b/log.php

http://192.168.5.1/data/th/b/index.php

http://192.168.5.1/data/th/b/tools.php

http://192.168.5.1/a/log.php

http://192.168.5.1/a/index.php

http://192.168.5.1/a/tools.php

http://192.168.5.1/a/user/other/log.php

http://192.168.5.1/a/user/other/index.php

http://192.168.5.1/a/user/other/tools.php

http://192.168.5.1/a/data/th/b/log.php

http://192.168.5.1/a/data/th/b/index.php

http://192.168.5.1/a/data/th/b/tools.php

http://192.168.5.1/b/log.php

http://192.168.5.1/b/index.php

http://192.168.5.1/b/tools.php

http://192.168.5.1/b/user/other/log.php

http://192.168.5.1/b/user/other/index.php

http://192.168.5.1/b/user/other/tools.php

http://192.168.5.1/b/data/th/b/log.php

http://192.168.5.1/b/data/th/b/index.php

http://192.168.5.1/b/data/th/b/tools.php

and (3) identifying a custom error page:

request access to random page by http Get means gets web site response page htm3 (text is too long and symbols are used instead).

The random page is http://192.168.5.1/5tdshfdskjf8ds7fu90dsfjqwkj

2.1.2 Access to Path to be detected

And requesting the files in the 2.1.2 by using an http Get mode, judging the response code of each file, and recording a response packet and a file path of the file to wait for subsequent detection if the response code of a certain file is 200. The other response codes are not recorded.

The similarity between html4 and html3 is calculated using the simhash algorithm and if similar, discarded.

Here, http://192.168.5.1/b/user/other/log. php responds 200, responds to page html4 and html4 is not similar to html 3. Then http://192.168.5.1/b/user/other/log. php is the suspect web backdoor file.

The term 1: webshell, the name of the backdoor of a website

The term 2: isolated links, links not in the URL tree of a web site

Finally, it is also noted that the above-mentioned lists merely illustrate a few specific embodiments of the invention. It is obvious that the invention is not limited to the above embodiments, but that many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.

Claims

The web backdoor path detection method is characterized by comprising the following steps: the method comprises the following steps:

1) acquiring an aggregation Path set $ Webshell _ Path and a file Name set $ Webshell _ Name;

adding a Path of a URL of a WEB backdoor into a set Path set $ Webshell _ Path, and adding a file Name of the URL of the WEB backdoor into a file Name set $ Webshell _ Name;

adding common English and common person names into a file Name set $ Webshell _ Name;

2) using a Web crawler to crawl along a website home Page to obtain a directory Tree $ Web _ Catalog, a URL Tree $ Web _ Url _ Tree, a website Root directory $ Web _ Root and a custom Error Page $ Error _ Page of the website;

the custom Error Page $ Error _ Page is used for acquiring a website custom Error Page by requesting a batch of non-existent pages; respectively detecting response pages returned by the website to the non-existent pages and the malicious requests;

the absence or malicious request is as follows:

firstly, website homepage address + random character string;

secondly, a website homepage address, a random character string and a script environment are combined;

thirdly, website home page address + malicious request url;

3) acquiring a URL set $ Target _ Url to be detected according to a set Path set $ Webshell _ Path, a file Name set $ Webshell _ Name, a directory Tree $ Web _ Catalog and a URL Tree $ Web _ Url _ Tree;

using Http to request to access the connection in the URL set $ Target _ Url to be detected to obtain the Suspicious URL set $ Suspicious _ Url;

the method comprises the following steps:

3.1), performing Cartesian multiplication on a directory tree $ Web _ Catalog and a file Name set $ Webshell _ Name, and adding a result into a URL set $ Target _ Url to be detected;

performing Cartesian multiplication on a website Root directory $ Web _ Root and a Path set $ Webshell _ Path, performing Cartesian multiplication on a result and a file Name set $ Webshell _ Name, and adding a final result into a URL set $ Target _ Url to be detected;

the URL set $ Target _ Url to be detected is subjected to de-repeated linking, and the union set of the URL set $ Target _ Url to be detected and the URL Tree $ Web _ Url _ Tree is subtracted from the URL set $ Target _ Url to be detected, so that the final URL set $ Target _ Url to be detected is obtained;

3.2), suspicious path identification phase,

sequentially detecting links in a $ Target _ Url set of the URL set to be detected finally obtained in the step 3.1), using Http to request to access the links, judging that the response code is 200 and the link of the non-self-defined error page is a Suspicious link, and adding the Suspicious link into the $ Suspicious _ Url set of the URL set to be detected; the method for judging the user-defined error page comprises the following steps: calculating the content similarity of the user-defined error page and the access link page, and defining the error page for the access link page when the content similarity of the page is larger than a preset value; otherwise it is not a custom error page.
2. The web backdoor path detecting method according to claim 1, wherein:

in step 3.2, similarity is judged using the simhash algorithm.
3. The web backdoor path detecting method according to claim 1 or 2, characterized in that:

1) knowledge base

Presetting a path and file name detection library for web backdoor detection by the following method:

1.1, acquiring a published file path set and file name characteristics of a web backdoor case through user experience and internet search;

1.2 collecting common English words and adding the common English words into a file name set;

1.3 collecting the names of the commonly used persons and adding the names into a file name set;

and (3) counting the collection results in 1.1, 1.2 and 1.3, and obtaining a file path set after duplication removal as follows:

/user/other/

/data/th/b/

the set of filenames is as follows:

log.php

index.php

tools.php

detecting website http://192.168.5.1 website Web backdoor

2) Crawling the directory tree for http://192.168.5.1 using a web crawler

By recursively accessing the http://192.168.5.1 pages of the website and internal links in all the pages thereof in an http Get mode, the directory tree of the website can be identified as follows:

http://192.168.5.1/

http://192.168.5.1/a/

http://192.168.5.1/b/

http://192.168.5.1/b/c/

2.1 suspicious Path identification phase

2.1.1 Path combination phase

Generating a set of instrumented files using the paths and files in the knowledge base in combination with the directory of the target web site (using a Cartesian product);

the file set is as follows:

http://192.168.5.1/log.php

http://192.168.5.1/index.php

http://192.168.5.1/tools.php

http://192.168.5.1/user/other/log.php

http://192.168.5.1/user/other/index.php

http://192.168.5.1/user/other/tools.php

http://192.168.5.1/data/th/b/log.php

http://192.168.5.1/data/th/b/index.php

http://192.168.5.1/data/th/b/tools.php

http://192.168.5.1/a/log.php

http://192.168.5.1/a/index.php

http://192.168.5.1/a/tools.php

http://192.168.5.1/a/user/other/log.php

http://192.168.5.1/a/user/other/index.php

http://192.168.5.1/a/user/other/tools.php

http://192.168.5.1/a/data/th/b/log.php

http://192.168.5.1/a/data/th/b/index.php

http://192.168.5.1/a/data/th/b/tools.php

http://192.168.5.1/b/log.php

http://192.168.5.1/b/index.php

http://192.168.5.1/b/tools.php

http://192.168.5.1/b/user/other/log.php

http://192.168.5.1/b/user/other/index.php

http://192.168.5.1/b/user/other/tools.php

http://192.168.5.1/b/data/th/b/log.php

http://192.168.5.1/b/data/th/b/index.php

http://192.168.5.1/b/data/th/b/tools.php

and (3) identifying a custom error page:

requesting to access a random page by an http Get mode to obtain a response page html3 of the website;

the random page is http://192.168.5.1/5tdshfdskjf8ds7fu90dsfjqwkj

2.1.2 Access to Path to be detected

Requesting the files in the 2.1.2 by using an http Get mode, judging the response code of each file, and recording a response packet and a file path of the file to wait for subsequent detection if the response code of a certain file is 200; other response codes are not recorded;

calculating the similarity between html4 and html3 by using a simhash algorithm, and if the similarity is similar, discarding the similarity;

http://192.168.5.1/b/user/other/log. php response 200, response page html4 and html4 is not similar to html 3;

then http://192.168.5.1/b/user/other/log. php is the suspect web backdoor file.