CN107918575A

CN107918575A - The monitoring method and device of a kind of page status

Info

Publication number: CN107918575A
Application number: CN201610878315.5A
Authority: CN
Inventors: 潘晓明
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2016-10-08
Filing date: 2016-10-08
Publication date: 2018-04-17
Anticipated expiration: 2036-10-08
Also published as: CN107918575B

Abstract

The present invention provides a method and device for monitoring page status. The method is to obtain an external configuration file, the external configuration file includes a URL list of the page to be tested, and the configuration information includes the URL of the page to be tested, crawling subpage URL rules, Control attribute rules, inspection rules, JavaScript switches, and client information; read URLs in the URL list, and generate document object model information for pages corresponding to URLs through client information and JavaScript switches for URLs; crawl subpage URL rules, The control attribute rules and checking rules check the document object model information of the page corresponding to the URL, and generate checking results. The invention can check a specific page or all pages according to the needs of users, find out the problems of the pages to be tested in time, and give feedback to the website maintenance personnel in time.

Description

Method and device for monitoring page status

技术领域technical field

本发明涉及计算机网络及计算机软件领域，具体涉及一种页面状态的监控方法及装置。The invention relates to the field of computer networks and computer software, in particular to a method and device for monitoring page status.

背景技术Background technique

用户在浏览网页的时候，浏览器加载显示的页面由于各种原因经常会发生显示异常页面的情况，例如出现商品价格丢失、活动页过期、活动页无法访问等问题，因此需要通过技术手段能够在发生页面异常的时候，在第一时间获取页面的失效状态。When the user browses the web, the page displayed by the browser often displays abnormal pages due to various reasons, such as missing commodity prices, expired active pages, and inaccessible active pages. Therefore, it is necessary to use technical means to be able to When a page exception occurs, obtain the failure status of the page at the first time.

目前为了能够获取页面的失效状态，通常都是通过人工定时打开页面对特定页面进行检查，而这种方式的缺点就在于整个检查工作不仅费时，而且检查工作效率低下，很难满足企业对网站状态的监控工作。At present, in order to be able to obtain the failure status of the page, it is usually to check the specific page by manually opening the page at regular intervals. The disadvantage of this method is that the entire inspection work is not only time-consuming, but also the inspection work is inefficient, and it is difficult to meet the requirements of enterprises on the status of the website. monitoring work.

发明内容Contents of the invention

有鉴于此，本发明的目的是提供一种页面状态的监控方法及装置，以实现在第一时间获取目标页面的所有关联页面以及页面信息和页面控件属性，并且及时向网站维护人员进行反馈。In view of this, the purpose of the present invention is to provide a method and device for monitoring page status, so as to obtain all associated pages, page information and page control attributes of the target page at the first time, and provide timely feedback to website maintenance personnel.

本发明的技术方案是提供一种页面状态的监控方法，该方法包括：The technical solution of the present invention is to provide a method for monitoring page status, the method comprising:

步骤S101：获取外部配置文件，所述外部配置文件包括待测页面的URL列表，其中，所述外部配置文件中的配置信息包括所述待测页面的URL、爬取子页面URL规则、控件属性规则、检查规则、JavaScript开关以及客户端信息；Step S101: Obtain an external configuration file, the external configuration file includes a URL list of the page to be tested, wherein the configuration information in the external configuration file includes the URL of the page to be tested, crawling subpage URL rules, and control attributes Rules, inspection rules, JavaScript switches, and client information;

步骤S102：读取所述待测页面的URL列表中的URL，并且针对所述URL，通过所述客户端信息以及所述JavaScript开关，生成与所述URL对应的页面的文档对象模型信息；Step S102: read the URL in the URL list of the page to be tested, and generate document object model information of the page corresponding to the URL for the URL through the client information and the JavaScript switch;

步骤S103：按照所述爬取子页面URL规则、所述控件属性规则以及所述检查规则，对与所述URL对应的页面的文档对象模型信息进行检查，生成检查结果。Step S103: Check the document object model information of the page corresponding to the URL according to the crawling subpage URL rule, the control attribute rule, and the checking rule, and generate a checking result.

可选地，所述爬取子页面URL规则为设置检测域名信息或特定URL；Optionally, the crawling subpage URL rule is set to detect domain name information or a specific URL;

所述控件属性规则为获取待测页面的控件信息；The control attribute rule is to obtain the control information of the page to be tested;

所述检查规则为获取所述待测页面出现的问题提示信息；The inspection rule is to obtain prompt information about problems that appear on the page to be tested;

所述JavaScript开关用于判断检查页面是否需要执行JavaScript脚本；The JavaScript switch is used to judge whether the checking page needs to execute the JavaScript script;

所述客户端信息用于模拟访问页面的客户端属性。The client information is used to simulate client attributes of the accessed page.

可选地，步骤S102还包括：在读取待测页面的URL列表中的URL时，判断所述待测页面是否需要执行JavaScript，按照所述客户端信息通过页面请求模拟工具执行所述待测页面。Optionally, step S102 further includes: when reading the URL in the URL list of the page to be tested, judging whether the page to be tested needs to execute JavaScript, and executing the page to be tested through the page request simulation tool according to the client information page.

可选地，步骤S103还包括：通过使用正则表达式，从所述待测页面的文档对象模型信息中获取和保存符合所述爬取子页面URL规则的所述URL；Optionally, step S103 further includes: acquiring and saving the URL conforming to the crawling subpage URL rule from the document object model information of the page to be tested by using a regular expression;

将所述文档对象模型信息中的CSS内容内敛到所述文档对象模型信息中，生成内敛的文档对象模型信息；Introducing CSS content in the document object model information into the document object model information to generate introverted document object model information;

根据所述控件属性规则以及所述检查规则，对所述内敛的文档对象模型信息使用正则表达式进行检查，生成所述检查结果。According to the control attribute rules and the checking rules, the restrained document object model information is checked using a regular expression, and the checking result is generated.

可选地，所述方法还包括：将所述最终检查结果以及将所述问题提示信息保存至数据库中，并将所述检测结果以及将所述问题页面的提示信息通过邮件告知页面维护人员。Optionally, the method further includes: saving the final inspection result and the problem prompt information into a database, and notifying the page maintainer of the detection result and the prompt information of the problem page by email.

可选地，所述外部配置文件中的配置信息还包括：爬取深度，用于设置检查页面的层级。Optionally, the configuration information in the external configuration file further includes: crawling depth, which is used to set the level of the inspection page.

可选地，所述方法还包括：根据所述爬取子页面URL规则，爬取与所述爬取深度对应的URL，从而生成新的待测页面的URL列表，然后重复执行步骤S102和步骤S103，待检查完毕全部所述待测页面的URL列表中的URL，得到最终检查结果。Optionally, the method further includes: according to the crawling subpage URL rule, crawling the URL corresponding to the crawling depth, thereby generating a new URL list of the page to be tested, and then repeatedly executing steps S102 and S103. After checking all the URLs in the URL list of the pages to be tested, a final checking result is obtained.

本发明还提供一种页面状态的监控装置，所述装置包括：The present invention also provides a device for monitoring page status, the device comprising:

配置信息模块，用于获取外部配置文件，所述外部配置文件包括待测页面的URL列表，其中，所述外部配置文件中的配置信息包括所述待测页面的URL、爬取子页面URL规则、控件属性规则、检查规则、JavaScript开关、客户端信息以及爬取深度；The configuration information module is used to obtain an external configuration file, the external configuration file includes a URL list of the page to be tested, wherein the configuration information in the external configuration file includes the URL of the page to be tested, crawling subpage URL rules , control attribute rules, inspection rules, JavaScript switches, client information and crawling depth;

URL解析模块，用于读取所述待测页面的URL列表中的URL，并且针对所述URL，通过所述客户端信息以及所述JavaScript开关，生成所述URL对应页面的文档对象模型信息；A URL parsing module, configured to read the URL in the URL list of the page to be tested, and for the URL, generate the document object model information of the page corresponding to the URL through the client information and the JavaScript switch;

页面检查模块，用于按照所述爬取子页面URL规则、所述控件属性规则以及所述检查规则，对与所述URL对应的页面的文档对象模型信息进行检查，生成检查结果；A page checking module, configured to check the document object model information of the page corresponding to the URL according to the crawling subpage URL rules, the control attribute rules and the checking rules, and generate checking results;

页面监控模块，用于如果所述待测页面的爬取深度大于1，则根据所述爬取子页面URL规则，爬取与所述爬取深度对应的URL，从而生成新的待测页面的URL列表，然后重复执行所述URL解析模块和所述页面检查模块，待检查完毕全部所述待测页面的URL列表中的URL，得到最终检查结果。Page monitoring module, for if the crawling depth of the page to be tested is greater than 1, crawl the URL corresponding to the crawling depth according to the crawling subpage URL rule, thereby generating a new page to be tested URL list, and then repeatedly execute the URL parsing module and the page checking module, until the URLs in the URL list of all the pages to be tested are checked, and the final checking result is obtained.

所述客户端信息用于模拟访问页面的客户端属性；The client information is used to simulate the client attributes of the accessed page;

所述爬取深度用于设置检查页面的层级。The crawling depth is used to set the level of inspection pages.

可选地，还包括：页面判断模块，用于在读取所述URL列表中的URL时，判断所述待测页面是否需要执行JavaScript，按照所述客户端信息通过页面请求模拟工具执行所述待测页面。Optionally, it also includes: a page judging module, configured to judge whether the page to be tested needs to execute JavaScript when reading the URL in the URL list, and execute the JavaScript through a page request simulation tool according to the client information. The page to be tested.

可选地，所述第一页面检查模块还包括：Optionally, the first page checking module also includes:

URL获取模块，用于通过使用正则表达式，从所述待测页面的文档对象模型信息中获取和保存符合所述爬取子页面URL规则的所述URL；A URL acquisition module, configured to obtain and save the URL that meets the crawling subpage URL rule from the document object model information of the page to be tested by using a regular expression;

CSS内敛模块，用于将所述文档对象模型信息中的CSS内容内敛到所述文档对象模型信息中，生成内敛的文档对象模型信息；A CSS restraint module, configured to restrain CSS content in the document object model information into the document object model information, and generate restrained document object model information;

DOM检查模块，用于根据所述控件属性规则以及所述检查规则，对所述内敛的文档对象模型信息使用正则表达式进行检查，生成所述检查结果。The DOM checking module is configured to check the introverted document object model information using regular expressions according to the control attribute rules and the checking rules, and generate the checking result.

可选地，所述装置还包括：数据处理模块，用于将所述最终检查结果以及将所述问题提示信息保存至数据库中，并将所述检测结果以及将所述问题页面的提示信息通过邮件告知页面维护人员。Optionally, the device further includes: a data processing module, configured to save the final inspection result and the question prompt information in a database, and pass the test result and the prompt information of the question page through Email the page maintainer.

可选地，所述装置还包括：页面监控模块，用于根据所述爬取子页面URL规则，爬取与所述爬取深度对应的URL，从而生成新的待测页面的URL列表，然后重复执行步骤S102和步骤S103，待检查完毕全部所述待测页面的URL列表中的URL，得到最终检查结果。Optionally, the device further includes: a page monitoring module, configured to crawl the URL corresponding to the crawling depth according to the crawling subpage URL rule, so as to generate a new URL list of the page to be tested, and then Step S102 and Step S103 are repeatedly executed until all the URLs in the URL list of the pages to be tested are checked to obtain the final checking result.

通过本发明提供的页面状态的监控方法及装置，能够按照用户需要检查特定的页面还是所有页面并且及时发现待测页面问题，例如页面过期、链接错误、内容无效、访问出现错误等。另外，页面维护人员可以按照需求对页面的检查深度进行设定以及是否开启对待测页面加载JavaScript等，并将监控结果写到数据库中，方便后期查询。Through the monitoring method and device of the page state provided by the present invention, it is possible to check a specific page or all pages according to the needs of the user, and timely find problems of the page to be tested, such as page expiration, link error, invalid content, access error, etc. In addition, the page maintenance personnel can set the inspection depth of the page and whether to enable JavaScript loading on the page to be tested according to the requirements, and write the monitoring results to the database for later query.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。在附图中：In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort. In the attached picture:

图1为本发明一实施例的页面状态的监控方法的流程示意图；FIG. 1 is a schematic flow diagram of a method for monitoring page status according to an embodiment of the present invention;

图2为本发明一实施例的页面状态的监控装置示意图。FIG. 2 is a schematic diagram of a device for monitoring page status according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚明白，下面结合附图对本发明实施例做进一步详细说明。在此，本发明的示意性实施例及其说明用于解释本发明，但并不作为对本发明的限定。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. Here, the exemplary embodiments and descriptions of the present invention are used to explain the present invention, but not to limit the present invention.

本领域技术技术人员知道，本发明的实施方式可以实现为一种系统、装置、设备、方法或计算机程序产品。因此，本公开可以具体实现为以下形式，即：完全的硬件、完全的软件(包括固件、驻留软件、微代码等)，或者硬件和软件结合的形式。Those skilled in the art know that the embodiments of the present invention can be implemented as a system, device, device, method or computer program product. Therefore, the present disclosure may be embodied in the form of complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.

在本文中，需要理解的是，所涉及的术语中：In this article, it is to be understood that among the terms involved:

URL：统一资源定位符(Uniform Resource Locator)是对可以从互联网上得到的资源的位置和访问方法的一种简洁的表示，是互联网上标准资源的地址。互联网上的每个文件都有一个唯一的URL，它包含的信息指出文件的位置以及浏览器应该怎么处理它；URL: Uniform Resource Locator (Uniform Resource Locator) is a concise representation of the location and access methods of resources that can be obtained from the Internet, and is the address of standard resources on the Internet. Every file on the Internet has a unique URL, which contains information indicating where the file is located and what the browser should do with it;

JavaScript：一种在网页中使用的脚本语言，常用来为网页添加各式各样的动态功能,为用户提供更流畅美观的浏览效果。通常JavaScript脚本是通过嵌入在HTML中来实现自身的功能的；JavaScript: A scripting language used in web pages, which is often used to add various dynamic functions to web pages to provide users with a smoother and more beautiful browsing effect. Usually JavaScript scripts realize their own functions by embedding them in HTML;

Cookie:一种保存在本地浏览器的信息，利用它可以直接和网页服务端通信，直接获取用户信息；Cookie: A kind of information stored in the local browser, which can be used to directly communicate with the web server and directly obtain user information;

user-agent：http请求头的一部分，用来告知服务端当前客户端的信息；user-agent: part of the http request header, used to inform the server of the current client information;

DOM：文档对象模型(DocumentObjectModel)，用来描述页面代码信息的模型；DOM: Document Object Model (DocumentObjectModel), a model used to describe page code information;

CSS：网页式样外部文件，用来统一管理页面式样信息。CSS: An external file of web page style, which is used to manage page style information in a unified manner.

示例性方法exemplary method

下面对本发明示例性的实施方式的自助还车方法进行介绍。需要注意的是，本发明的实施方式在此方面不受任何限制。相反，本发明的实施方式可以应用于适用的任何场景。The self-service car return method in an exemplary embodiment of the present invention will be introduced below. It should be noted that the embodiments of the present invention are not limited in this regard. On the contrary, the embodiments of the present invention can be applied to any applicable scene.

例如，参见图1所示，为本发明一实施例的页面状态的监控方法的流程示意图。For example, refer to FIG. 1 , which is a schematic flowchart of a method for monitoring a page status according to an embodiment of the present invention.

如图所示，该方法包括：As shown, the method includes:

可选地，所述步骤S102还包括：在读取所述URL列表中的URL时，判断所述待测页面是否需要执行JavaScript，按照所述客户端信息通过页面请求模拟工具执行所述待测页面。Optionally, the step S102 further includes: when reading the URL in the URL list, judging whether the page to be tested needs to execute JavaScript, and executing the page to be tested through a page request simulation tool according to the client information page.

可选地，所述步骤S103包括：通过使用正则表达式，从所述待测页面的文档对象模型信息中获取和保存符合所述爬取子页面URL规则的所述URL；Optionally, the step S103 includes: acquiring and saving the URL conforming to the crawling subpage URL rule from the document object model information of the page to be tested by using a regular expression;

可选地，还包括：将所述最终检查结果以及将所述问题提示信息保存至数据库中，并将所述检测结果以及将所述问题页面的提示信息通过邮件告知页面维护人员。Optionally, the method further includes: saving the final inspection result and the problem prompt information in a database, and notifying the page maintainer of the detection result and the prompt information of the problem page by email.

实施例Example

下面结合一个具体实施例对本发明进行具体描述，然而值得注意的是该具体实施例仅是为了更好地描述本发明，并不构成对本发明的不当限定。The present invention will be specifically described below in conjunction with a specific embodiment, but it should be noted that this specific embodiment is only for better describing the present invention, and does not constitute an improper limitation of the present invention.

具体来说，用户可以预先配置外部配置文件中的配置信息，配置得到待测页面的URL列表。根据用户需求，配置信息可以包括：例如，爬取页面的爬取子页面URL规则、控件属性规则，检查规则，JavaScript开关、客户端信息、爬取深度以及是否写入数据库等。Specifically, the user can pre-configure the configuration information in the external configuration file, and configure to obtain the URL list of the pages to be tested. According to user requirements, the configuration information may include: for example, crawling subpage URL rules, control attribute rules, inspection rules, JavaScript switches, client information, crawling depth, and whether to write to the database.

下面对配置信息进行详细介绍：The configuration information is described in detail below:

(1)用户可以定义URL以作为本发明一具体实施例中的检查对象，待测页面的URL可以配置在外部配置文件中的URL列表中，例如http://sale.jd.com/app/test.html。(1) The user can define URL as the inspection object in a specific embodiment of the present invention, and the URL of the page to be tested can be configured in the URL list in the external configuration file, such as http://sale.jd.com/app /test.html .

(2)爬取子页面URL规则，爬取子页面URL规则可以是匹配域名信息；(2) Crawling subpage URL rules, crawling subpage URL rules can match domain name information;

例如，配置http://sale.jd.com/，则根据该爬取子页面URL规则会在页面中查找所有的以http://sale.jd.com/开头的URL链接。For example, if http://sale.jd.com/ is configured, all URL links starting with http://sale.jd.com/ will be searched in the page according to the crawl subpage URL rule.

另外，该爬取子页面URL规则也可以是匹配特定URL；In addition, the crawling subpage URL rule can also match a specific URL;

例如，如果配置http://sale.jd.com/abc.html等，则根据该爬取子页面URL规则仅会执行http://sale.jd.com/abc.html这个URL链接。For example, if you configure http://sale.jd.com/abc.html , etc., only the URL link http://sale.jd.com/abc.html will be executed according to the crawl subpage URL rule.

(3)控件属性规则，由于页面控件是每个网页的脚本元素，如<div>表示一块区域，<table>表示表格，每个控件通过id属性来区分，比如<tableid＝1><tableid＝2>分别表示两个不同的表格。同时每个控件都有其他属性，比如长度，可见度等。因此用户可以设置检查某个控件的可见属性是否是可见的，则根据该控件属性规则，程序会在页面中查找到该控件并判断该控件的可见属性是否和用户要求一致，若不一致则会捕捉到该问题。通过配置页面控件信息，可以实现步骤S104中检查页面式样的目的。(3) Control attribute rules, because the page control is the script element of each web page, such as <div> represents a region, <table> represents a table, and each control is distinguished by the id attribute, such as <tableid=1><tableid= 2> represent two different tables respectively. At the same time, each control has other properties, such as length, visibility, etc. Therefore, the user can set to check whether the visible property of a control is visible, then according to the control property rules, the program will find the control on the page and judge whether the visible property of the control is consistent with the user's requirements, and if not, it will capture to the question. By configuring the page control information, the purpose of checking the page style in step S104 can be achieved.

(4)检查规则，用户设置问题类型来检查页面是否有问题，例如问题类型可以体现为“出现404错误”、“活动已经过期”或者“页面不存在”等，那么如果页面展示“活动已经过期”这类过期信息，则根据该检查规则，程序会捕捉到待测页面的问题提示信息。通过配置问题提示信息，可以实现步骤S104中检查页面错误信息的目的。(4) Check the rules. The user sets the question type to check whether there is a problem with the page. For example, the question type can be reflected as "a 404 error occurred", "the event has expired" or "the page does not exist", etc., then if the page displays "the event has expired ", according to the inspection rule, the program will capture the problem prompt information of the page to be tested. By configuring the problem prompt information, the purpose of checking page error information in step S104 can be achieved.

(5)JavaScript开关，即JavaScript开启开关(JavaScriptEnabled)，用户可以通过JavaScript开关设置待测页面是否需要执行页面内的JavaScript脚本。执行JavaScript脚本的页面会在最终的页面内容中看到动态生成的内容，而若不执行JavaScript脚本则不会(前提是该待测页面有动态生成的内容)。通过配置JavaScript开关，可以实现步骤S103中产生的文档对象模型信息(DOM)是否是执行了JavaScript脚本后生成的文档对象模型信息(DOM)。(5) JavaScript switch, that is, a JavaScript enable switch (JavaScript Enabled), through which the user can set whether the page to be tested needs to execute the JavaScript script in the page. The page that executes the JavaScript script will see the dynamically generated content in the final page content, but will not see it if the JavaScript script is not executed (provided that the page to be tested has dynamically generated content). By configuring the JavaScript switch, it can be realized whether the document object model information (DOM) generated in step S103 is the document object model information (DOM) generated after executing the JavaScript script.

(6)客户端信息(user-agent)，用户可以设置客户端信息(user-agent)用于模拟访问页面的客户端信息，例如设置iPhone，则会模拟在iPhone手机端的浏览器上去访问页面；例如设置PC，则会模拟在电脑上通过浏览器访问页面。通过配置客户端信息(user-agent)，可以实现步骤S103中模拟客户端目的。(6) Client information (user-agent), the user can set the client information (user-agent) to simulate the client information of the access page, for example, if the iPhone is set, it will simulate the browser on the iPhone mobile phone to access the page; For example, if you set up a PC, it will simulate accessing the page through a browser on the computer. By configuring the client information (user-agent), the purpose of simulating the client in step S103 can be achieved.

(7)数据库开关(DBSwitch)，用户可以设置开启数据库开关以便将捕获的页面提示信息写入到数据库中。另外，数据库配置需要添加数据库地址，数据库表名，访问数据库的账户和密码信息。通过配置数据库开关，可以实现步骤S105中将最终检查结果写入数据库的目的。(7) Database switch (DBSwitch), the user can set and enable the database switch so as to write the captured page prompt information into the database. In addition, the database configuration needs to add the database address, database table name, account and password information to access the database. By configuring the database switch, the purpose of writing the final inspection result into the database in step S105 can be achieved.

在本发明一实施例中，配置信息还可以包括：爬取深度(Depth)，用户可以设置爬取深度让监控程序做递归检查。例如，如果设置爬取深度为1层，则监控程序不会去爬取待测页面中的链接，而检查完入口的页面就结束了。但是，如果设置爬取深度为2层，则监控程序会去爬取用户定义的链接规则，并将爬取的这些链接保存下来。在入口页面全部检查完毕之后，再去检查刚才保存下来的链接，以此类推。通过配置爬取深度可以实现步骤S104中递归检查的目的。In an embodiment of the present invention, the configuration information may further include: crawling depth (Depth), and the user may set the crawling depth for the monitoring program to perform recursive inspection. For example, if the crawling depth is set to 1 layer, the monitoring program will not crawl the links in the page to be tested, and the page that checks the entry will end. However, if the crawling depth is set to 2 layers, the monitoring program will crawl the user-defined link rules and save the crawled links. After checking all the entry pages, check the link you just saved, and so on. The purpose of the recursive inspection in step S104 can be achieved by configuring the crawling depth.

步骤S102：读取所述待测页面的URL列表中的URL，并且针对所述URL，通过所述客户端信息以及所述JavaScript开关，生成与所述URL对应的页面的文档对象模型信息。Step S102: Read the URL in the URL list of the page to be tested, and generate document object model information of the page corresponding to the URL for the URL through the client information and the JavaScript switch.

值得一提的是，在读取所述URL列表中的URL时，还需要判断所述待测页面是否需要执行JavaScript，按照所述客户端信息通过页面请求模拟工具执行所述待测页面。It is worth mentioning that when reading the URLs in the URL list, it is also necessary to determine whether the page to be tested needs to execute JavaScript, and execute the page to be tested through the page request simulation tool according to the client information.

具体来说，通过读取设置JavaScriptEnabled，并且若JavaScriptEnabled为true，则开启javascript执行。Specifically, JavaScriptEnabled is set by reading, and if JavaScriptEnabled is true, javascript execution is enabled.

例如，某一待测页面的URL为http://testabc.com/index.html，其中，html内容如下：For example, the URL of a page to be tested is http://testabc.com/index.html , where the html content is as follows:

document.write(“你好”)；document.write("Hello");

</script></script>

</html></html>

如果通过浏览器执行该待测页面，则在页面上会显示“你好”。同样，若执行了JavaScript，我们能在生成该待测页面的文档对象模型信息(DOM)中看到“你好”；若未执行JavaScript，我们将看不到“你好”这个关键词。If the page to be tested is executed through a browser, "Hello" will be displayed on the page. Similarly, if JavaScript is executed, we can see "hello" in the document object model information (DOM) of the page to be tested; if JavaScript is not executed, we will not see the keyword "hello".

另外，通过一般的http请求工具，若请求的待测页面为静态页面，可以返回的文档对象模型信息(DOM)，但若请求的待测页面为动态页面，则返回的文档对象模型信息(DOM)中则不会包含动态生成的内容。In addition, through the general http request tool, if the requested page to be tested is a static page, the document object model information (DOM) can be returned, but if the requested page to be tested is a dynamic page, the returned document object model information (DOM ) will not include dynamically generated content.

在本发明一实施例中，通过phantomjs工具去请求待测页面，在开启了JavaScript开关(允许执行JavaScript)之后，我们都能够获取无论是动态还是静态的页面的文档对象模型信息(DOM)，以便于后期分析。In an embodiment of the present invention, go to request the page to be tested by phantomjs tool, after opening the JavaScript switch (permission to execute JavaScript), we can all obtain the document object model information (DOM) of no matter be dynamic or static page, so that analyzed later.

在现有技术中，通常模拟用户客户端主要是通过修改http请求头的方式，操作复杂，同时容易输错。In the prior art, the user client is usually simulated mainly by modifying the http request header, which is complicated to operate and easy to input wrongly.

由于user-agent可以模拟不同的用户客户端情况，因此在本发明中可以通过设置cookie信息就可以模拟用户登录的场景，而设置user-agent中的值就可以实现模拟用户客户端信息的目的，例如设置<user-agent>iPhone的配置信息来告知监控访问页面模拟的是iPhone移动终端。Since user-agent can simulate different user client situations, the scene of user login can be simulated by setting cookie information in the present invention, and the purpose of simulating user client information can be realized by setting the value in user-agent. For example, set the configuration information of <user-agent> iPhone to inform the monitoring and access page to simulate the iPhone mobile terminal.

以http://www.163.com为例，网站通过这个值来返回不同的页面内容。在实际测试场景中，我们需要查看具体客户端的输出，如果想知道用户通过iPhone移动终端访问http://www.163.com是否可以看到体育这个版块。那么通过设置user-agent的值，就可以模拟iPhone用户。Take http://www.163.com as an example, the website uses this value to return different page content. In the actual test scenario, we need to check the output of the specific client. If we want to know whether the user can see the sports section when visiting http://www.163.com through the iPhone mobile terminal. Then by setting the value of user-agent, you can simulate an iPhone user.

最后，通过所述客户端信息以及所述JavaScript开关，就可以生成待测页面的文档对象模型信息(DOM)，同时将该文档对象模型信息(DOM)保存下来做进一步分析。Finally, through the client information and the JavaScript switch, the document object model information (DOM) of the page to be tested can be generated, and the document object model information (DOM) can be saved for further analysis.

首先，通过使用正则表达式，从所述待测页面的文档对象模型信息(DOM)中获取和保存符合所述爬取子页面URL规则的所述URL；本发明利用页面匹配的正则表达式算法，可以过滤出要分析的页面问题描述、子活动链接、特定控件的属性、图片链接等。First, by using regular expressions, obtain and save the URL that meets the crawling subpage URL rules from the Document Object Model information (DOM) of the page to be tested; the present invention utilizes the regular expression algorithm of page matching , you can filter out page problem descriptions, sub-activity links, properties of specific controls, image links, etc. to be analyzed.

其次，通过解析内敛前的文档对象模型信息(DOM)，将所述URL的CSS内容内敛到文档对象模型信息(DOM)中，生成内敛的文档对象模型信息。通过该文档对象模型信息(DOM)，我们可以直接查看特定控件的式样属性。Secondly, by analyzing the document object model information (DOM) before introversion, the CSS content of the URL is introverted into the document object model information (DOM) to generate introverted document object model information. Through this Document Object Model information (DOM), we can directly view the style properties of a specific control.

例如，在对文档对象模型信息(DOM)内敛前，查看某个控件显示属性方式得通过CSS文件中标记的查看，而内敛后，我们可以直接查看控件属性，如：<span display:block>你好</span>。For example, before the document object model information (DOM) is introverted, the way to check the display properties of a certain control is to check the mark in the CSS file. After introversion, we can directly view the control properties, such as: <span display:block> you Good</span>.

最后，根据所述控件属性规则以及所述检查规则，对该内敛的文档对象模型信息(DOM)使用正则表达式进行检查，生成检查结果。Finally, according to the control attribute rules and the checking rules, the regular expression is used to check the introverted Document Object Model information (DOM), and a checking result is generated.

具体来说，通过用户设置的检查规则，使用正则表达式，检查该内敛的文档对象模型信息(DOM)中是否包含问题描述内容(例如，“出现404错误”、“活动已经过期”或者“页面不存在”等)，若包含问题描述内容则将问题描述内容记录下来。例如，如果外部配置文件中的配置检查页面过期，则若待测页面中包含页面过期等关键词，则将该活动页URL链接的过期问题记录下来。Specifically, through the inspection rules set by the user, use regular expressions to check whether the introverted Document Object Model information (DOM) contains problem description content (for example, "404 error occurred", "activity has expired" or "page Does not exist", etc.), if the problem description content is included, record the problem description content. For example, if the configuration check page in the external configuration file expires, if the page to be tested contains keywords such as page expiration, the expiration problem of the URL link of the active page is recorded.

通过外部配置文件中配置信息中的控件属性检查规则，通过正则表达式，在内敛了CSS后的DOM树结构中匹配需要检查的控件属性。例如：若需要检查某个链接(<a>标签)的属性是不可见的，则若匹配到该标签，同时它的display属性是可见的，则将该错误问题记录下来Through the control attribute inspection rules in the configuration information in the external configuration file, through regular expressions, match the control attributes that need to be checked in the DOM tree structure after the CSS is introverted. For example: if you need to check that the attribute of a certain link (<a> tag) is invisible, if the tag is matched and its display attribute is visible, record the error problem

步骤S104：如果所述待测页面的爬取深度大于1，则根据所述爬取子页面URL规则，爬取与所述爬取深度对应的URL，从而生成新的待测页面的URL列表，然后重复执行步骤S102和步骤S103，待检查完毕全部所述待测页面的URL列表中的URL，得到最终检查结果。Step S104: If the crawling depth of the page to be tested is greater than 1, crawl the URL corresponding to the crawling depth according to the crawling subpage URL rule, thereby generating a new URL list of the page to be tested, Then step S102 and step S103 are repeatedly executed until all the URLs in the URL list of the pages to be tested are checked to obtain the final checking result.

若所有URL检查完毕，则可通过外部配置的数据库信息将问题分类地记录到数据库中。同时，通过外部配置的邮件责任人接收列表，将问题已邮件形式发送到责任人。If all the URLs are checked, the problems can be classified and recorded in the database through the externally configured database information. At the same time, the problem is sent to the responsible person by email through the externally configured email responsible person receiving list.

如图2所示为本发明一实施例的页面状态的监控装置的示意图。该装置2包括：FIG. 2 is a schematic diagram of a device for monitoring page status according to an embodiment of the present invention. The device 2 includes:

配置信息模块21，用于获取外部配置文件，所述外部配置文件包括待测页面的URL列表，其中，所述外部配置文件中的配置信息包括所述待测页面的URL、爬取子页面URL规则、控件属性规则、检查规则、JavaScript开关、客户端信息以及爬取深度；Configuration information module 21, is used for obtaining external configuration file, and described external configuration file comprises the URL list of page to be tested, and wherein, the configuration information in the described external configuration file comprises the URL of described page to be tested, crawls sub-page URL Rules, control attribute rules, inspection rules, JavaScript switches, client information and crawling depth;

URL解析模块22，用于读取所述待测页面的URL列表中的URL，并且针对所述URL，通过所述客户端信息以及所述JavaScript开关，生成所述URL对应页面的文档对象模型信息；The URL parsing module 22 is configured to read the URL in the URL list of the page to be tested, and for the URL, generate the document object model information of the page corresponding to the URL through the client information and the JavaScript switch ;

页面检查模块23，用于按照所述爬取子页面URL规则、所述控件属性规则以及所述检查规则，对与所述URL对应的页面的文档对象模型信息进行检查，生成检查结果。The page checking module 23 is configured to check the document object model information of the page corresponding to the URL according to the crawling subpage URL rule, the control attribute rule and the checking rule, and generate a checking result.

可选地，还包括：页面判断模块24，用于在读取所述URL列表中的URL时，判断所述待测页面是否需要执行JavaScript，按照所述客户端信息通过页面请求模拟工具执行所述待测页面。Optionally, it also includes: a page judging module 24, which is used to judge whether the page to be tested needs to execute JavaScript when reading the URL in the URL list, and execute the JavaScript by the page request simulation tool according to the client information. Describe the page to be tested.

可选地，所述页面检查模块23还包括：Optionally, the page checking module 23 also includes:

URL获取模块231，用于通过使用正则表达式，从所述待测页面的文档对象模型信息中获取和保存符合所述爬取子页面URL规则的所述URL；The URL obtaining module 231 is used to obtain and save the URL that conforms to the crawling subpage URL rule from the document object model information of the page to be tested by using a regular expression;

CSS内敛模块232，用于将所述文档对象模型信息中的CSS内容内敛到所述文档对象模型信息中，生成内敛的文档对象模型信息；A CSS restraint module 232, configured to restrain the CSS content in the document object model information into the document object model information, and generate restrained document object model information;

DOM检查模块233，用于根据所述控件属性规则以及所述检查规则，对所述内敛的文档对象模型信息使用正则表达式进行检查，生成所述检查结果。The DOM checking module 233 is configured to check the introverted document object model information using a regular expression according to the control attribute rule and the checking rule, and generate the checking result.

可选地，该装置还包括：数据处理模块25，用于将所述最终检查结果以及将所述问题提示信息保存至数据库中，并将所述检测结果以及将所述问题页面的提示信息通过邮件告知页面维护人员。Optionally, the device further includes: a data processing module 25, configured to save the final inspection result and the question prompt information in a database, and pass the test result and the prompt information of the question page through Email the page maintainer.

可选地，该装置还包括：页面监控模块26，用于如果所述待测页面的爬取深度大于1，则根据所述爬取子页面URL规则，爬取与所述爬取深度对应的URL，从而生成新的待测页面的URL列表，然后重复执行所述URL解析模块和所述页面检查模块，待检查完毕全部所述待测页面的URL列表中的URL，得到最终检查结果。Optionally, the device also includes: a page monitoring module 26, configured to crawl the page corresponding to the crawling depth according to the crawling subpage URL rule if the crawling depth of the page to be tested is greater than 1. URL, thereby generating a new URL list of the page to be tested, then repeatedly executing the URL parsing module and the page inspection module, until all URLs in the URL list of the page to be tested are checked, to obtain the final inspection result.

由于本发明提供的页面状态的监控装置是上述方法对应的装置，故不在此赘述。Since the device for monitoring the page status provided by the present invention is a device corresponding to the above method, it will not be described in detail here.

此外，尽管在附图中以特定顺序描述了本发明方法的操作，但是，这并非要求或者暗示必须执行全部所示的操作才能实现期望的结果。附加地或备选地，可以省略某些步骤，将多个步骤合并为一个步骤执行，和/或将一个步骤分解为多个步骤执行。Additionally, while operations of the methods of the present invention are depicted in the figures in a particular order, this does not require or imply that all illustrated operations must be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution.

以上所述的具体实施例，对本发明的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上所述仅为本发明的具体实施例而已，并不用于限定本发明的保护范围，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The specific embodiments described above have further described the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. Protection scope, within the spirit and principles of the present invention, any modification, equivalent replacement, improvement, etc., shall be included in the protection scope of the present invention.

Claims

1. A monitoring method of page state, is characterized in that, described method comprises:

Step S101: Obtain an external configuration file, the external configuration file includes a URL list of the page to be tested, wherein the configuration information in the external configuration file includes the URL of the page to be tested, crawling subpage URL rules, and control attributes Rules, inspection rules, JavaScript switches, client information;

Step S102: read the URL in the URL list of the page to be tested, and generate document object model information of the page corresponding to the URL for the URL through the client information and the JavaScript switch;

Step S103: Check the document object model information of the page corresponding to the URL according to the crawling subpage URL rule, the control attribute rule, and the checking rule, and generate a checking result.

2. The method of claim 1, wherein,

The crawling sub-page URL rule is set to detect domain name information or a specific URL;

The control attribute rule is to obtain the control information of the page to be tested;

The inspection rule is to obtain prompt information about problems that appear on the page to be tested;

The JavaScript switch is used to judge whether the checking page needs to execute the JavaScript script;

The client information is used to simulate client attributes of the accessed page.

3. The method according to claim 1, wherein step S102 further comprises:

When reading the URL in the URL list of the page to be tested, it is judged whether the page to be tested needs to execute JavaScript, and the page to be tested is executed through the page request simulation tool according to the client information.

4. The method according to claim 1, characterized in that the step S103 comprises:

Obtaining and storing the URL conforming to the crawling subpage URL rule from the document object model information of the page to be tested by using a regular expression;

Introducing CSS content in the document object model information into the document object model information to generate introverted document object model information;

According to the control attribute rules and the checking rules, the restrained document object model information is checked using a regular expression, and the checking result is generated.

5. The method of claim 2, further comprising:

The final inspection result and the problem prompt information are saved in the database, and the test result and the prompt information of the problem page are notified to the page maintainer by email.

6. The method according to claim 1, wherein the configuration information in the external configuration file further comprises:

Crawl depth, used to set the level of inspection pages.

7. The method according to claim 6, further comprising:

According to the crawling subpage URL rule, crawl the URL corresponding to the crawling depth, thereby generating a new URL list of the page to be tested, and then repeatedly execute steps S102 and S103, and wait for the inspection to be completed. URLs in the URL list of the page to get the final check result.

8. A monitoring device for page status, characterized in that the device comprises:

The configuration information module is used to obtain an external configuration file, the external configuration file includes a URL list of the page to be tested, wherein the configuration information in the external configuration file includes the URL of the page to be tested, crawling subpage URL rules , control attribute rules, inspection rules, JavaScript switches, client information and crawling depth;

A URL parsing module, configured to read the URL in the URL list of the page to be tested, and for the URL, generate the document object model information of the page corresponding to the URL through the client information and the JavaScript switch;

A page checking module, configured to check the document object model information of the page corresponding to the URL according to the crawling subpage URL rules, the control attribute rules and the checking rules, and generate checking results;

Page monitoring module, for if the crawling depth of the page to be tested is greater than 1, crawl the URL corresponding to the crawling depth according to the crawling subpage URL rule, thereby generating a new page to be tested URL list, and then repeatedly execute the URL parsing module and the page checking module, until the URLs in the URL list of all the pages to be tested are checked, and the final checking result is obtained.

9. The device of claim 8, wherein:

10. The device according to claim 8, further comprising: a page judging module, for judging whether the page to be tested needs to execute JavaScript when the URL in the URL list of the page to be tested is used, according to the The client information executes the page to be tested through a page request simulation tool.

11. The device according to claim 8, wherein the page checking module further comprises:

A URL acquisition module, configured to obtain and save the URL that meets the crawling subpage URL rule from the document object model information of the page to be tested by using a regular expression;

A CSS restraint module, configured to restrain CSS content in the document object model information into the document object model information, and generate restrained document object model information;

The DOM checking module is configured to check the introverted document object model information using regular expressions according to the control attribute rules and the checking rules, and generate the checking result.

12. The device of claim 9, further comprising:

The data processing module is used to save the final inspection result and the prompt information of the problem in the database, and notify the page maintenance personnel of the detection result and the prompt information of the problem page by email.

13. The device according to claim 9, wherein the configuration information in the external configuration file further comprises:

Crawl depth, used to set the level of inspection pages.

14. The device according to claim 13, further comprising:

The page monitoring module is used to crawl the URL corresponding to the crawling depth according to the crawling sub-page URL rule, thereby generating a new URL list of the page to be tested, and then repeatedly executing steps S102 and S103, to be checked Complete all the URLs in the URL list of the page to be tested to obtain the final inspection result.