[go: up one dir, main page]

CN107918575A - The monitoring method and device of a kind of page status - Google Patents

The monitoring method and device of a kind of page status Download PDF

Info

Publication number
CN107918575A
CN107918575A CN201610878315.5A CN201610878315A CN107918575A CN 107918575 A CN107918575 A CN 107918575A CN 201610878315 A CN201610878315 A CN 201610878315A CN 107918575 A CN107918575 A CN 107918575A
Authority
CN
China
Prior art keywords
page
url
tested
information
crawling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610878315.5A
Other languages
Chinese (zh)
Other versions
CN107918575B (en
Inventor
潘晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201610878315.5A priority Critical patent/CN107918575B/en
Publication of CN107918575A publication Critical patent/CN107918575A/en
Application granted granted Critical
Publication of CN107918575B publication Critical patent/CN107918575B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/865Monitoring of software

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Information Transfer Between Computers (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本发明提供一种页面状态的监控方法及装置,该方法为获取外部配置文件,该外部配置文件包括待测页面的URL列表,该配置信息包括待测页面的URL、爬取子页面URL规则、控件属性规则、检查规则、JavaScript开关以及客户端信息;读取URL列表中的URL,针对URL通过客户端信息以及JavaScript开关,生成URL对应页面的文档对象模型信息;按照爬取子页面URL规则、控件属性规则以及检查规则,对URL对应页面的文档对象模型信息进行检查,生成检查结果。本发明能够按照用户需要检查特定的页面还是所有页面并且及时发现待测页面问题并及时向网站维护人员进行反馈。

The present invention provides a method and device for monitoring page status. The method is to obtain an external configuration file, the external configuration file includes a URL list of the page to be tested, and the configuration information includes the URL of the page to be tested, crawling subpage URL rules, Control attribute rules, inspection rules, JavaScript switches, and client information; read URLs in the URL list, and generate document object model information for pages corresponding to URLs through client information and JavaScript switches for URLs; crawl subpage URL rules, The control attribute rules and checking rules check the document object model information of the page corresponding to the URL, and generate checking results. The invention can check a specific page or all pages according to the needs of users, find out the problems of the pages to be tested in time, and give feedback to the website maintenance personnel in time.

Description

一种页面状态的监控方法及装置Method and device for monitoring page status

技术领域technical field

本发明涉及计算机网络及计算机软件领域,具体涉及一种页面状态的监控方法及装置。The invention relates to the field of computer networks and computer software, in particular to a method and device for monitoring page status.

背景技术Background technique

用户在浏览网页的时候,浏览器加载显示的页面由于各种原因经常会发生显示异常页面的情况,例如出现商品价格丢失、活动页过期、活动页无法访问等问题,因此需要通过技术手段能够在发生页面异常的时候,在第一时间获取页面的失效状态。When the user browses the web, the page displayed by the browser often displays abnormal pages due to various reasons, such as missing commodity prices, expired active pages, and inaccessible active pages. Therefore, it is necessary to use technical means to be able to When a page exception occurs, obtain the failure status of the page at the first time.

目前为了能够获取页面的失效状态,通常都是通过人工定时打开页面对特定页面进行检查,而这种方式的缺点就在于整个检查工作不仅费时,而且检查工作效率低下,很难满足企业对网站状态的监控工作。At present, in order to be able to obtain the failure status of the page, it is usually to check the specific page by manually opening the page at regular intervals. The disadvantage of this method is that the entire inspection work is not only time-consuming, but also the inspection work is inefficient, and it is difficult to meet the requirements of enterprises on the status of the website. monitoring work.

发明内容Contents of the invention

有鉴于此,本发明的目的是提供一种页面状态的监控方法及装置,以实现在第一时间获取目标页面的所有关联页面以及页面信息和页面控件属性,并且及时向网站维护人员进行反馈。In view of this, the purpose of the present invention is to provide a method and device for monitoring page status, so as to obtain all associated pages, page information and page control attributes of the target page at the first time, and provide timely feedback to website maintenance personnel.

本发明的技术方案是提供一种页面状态的监控方法,该方法包括:The technical solution of the present invention is to provide a method for monitoring page status, the method comprising:

步骤S101:获取外部配置文件,所述外部配置文件包括待测页面的URL列表,其中,所述外部配置文件中的配置信息包括所述待测页面的URL、爬取子页面URL规则、控件属性规则、检查规则、JavaScript开关以及客户端信息;Step S101: Obtain an external configuration file, the external configuration file includes a URL list of the page to be tested, wherein the configuration information in the external configuration file includes the URL of the page to be tested, crawling subpage URL rules, and control attributes Rules, inspection rules, JavaScript switches, and client information;

步骤S102:读取所述待测页面的URL列表中的URL,并且针对所述URL,通过所述客户端信息以及所述JavaScript开关,生成与所述URL对应的页面的文档对象模型信息;Step S102: read the URL in the URL list of the page to be tested, and generate document object model information of the page corresponding to the URL for the URL through the client information and the JavaScript switch;

步骤S103:按照所述爬取子页面URL规则、所述控件属性规则以及所述检查规则,对与所述URL对应的页面的文档对象模型信息进行检查,生成检查结果。Step S103: Check the document object model information of the page corresponding to the URL according to the crawling subpage URL rule, the control attribute rule, and the checking rule, and generate a checking result.

可选地,所述爬取子页面URL规则为设置检测域名信息或特定URL;Optionally, the crawling subpage URL rule is set to detect domain name information or a specific URL;

所述控件属性规则为获取待测页面的控件信息;The control attribute rule is to obtain the control information of the page to be tested;

所述检查规则为获取所述待测页面出现的问题提示信息;The inspection rule is to obtain prompt information about problems that appear on the page to be tested;

所述JavaScript开关用于判断检查页面是否需要执行JavaScript脚本;The JavaScript switch is used to judge whether the checking page needs to execute the JavaScript script;

所述客户端信息用于模拟访问页面的客户端属性。The client information is used to simulate client attributes of the accessed page.

可选地,步骤S102还包括:在读取待测页面的URL列表中的URL时,判断所述待测页面是否需要执行JavaScript,按照所述客户端信息通过页面请求模拟工具执行所述待测页面。Optionally, step S102 further includes: when reading the URL in the URL list of the page to be tested, judging whether the page to be tested needs to execute JavaScript, and executing the page to be tested through the page request simulation tool according to the client information page.

可选地,步骤S103还包括:通过使用正则表达式,从所述待测页面的文档对象模型信息中获取和保存符合所述爬取子页面URL规则的所述URL;Optionally, step S103 further includes: acquiring and saving the URL conforming to the crawling subpage URL rule from the document object model information of the page to be tested by using a regular expression;

将所述文档对象模型信息中的CSS内容内敛到所述文档对象模型信息中,生成内敛的文档对象模型信息;Introducing CSS content in the document object model information into the document object model information to generate introverted document object model information;

根据所述控件属性规则以及所述检查规则,对所述内敛的文档对象模型信息使用正则表达式进行检查,生成所述检查结果。According to the control attribute rules and the checking rules, the restrained document object model information is checked using a regular expression, and the checking result is generated.

可选地,所述方法还包括:将所述最终检查结果以及将所述问题提示信息保存至数据库中,并将所述检测结果以及将所述问题页面的提示信息通过邮件告知页面维护人员。Optionally, the method further includes: saving the final inspection result and the problem prompt information into a database, and notifying the page maintainer of the detection result and the prompt information of the problem page by email.

可选地,所述外部配置文件中的配置信息还包括:爬取深度,用于设置检查页面的层级。Optionally, the configuration information in the external configuration file further includes: crawling depth, which is used to set the level of the inspection page.

可选地,所述方法还包括:根据所述爬取子页面URL规则,爬取与所述爬取深度对应的URL,从而生成新的待测页面的URL列表,然后重复执行步骤S102和步骤S103,待检查完毕全部所述待测页面的URL列表中的URL,得到最终检查结果。Optionally, the method further includes: according to the crawling subpage URL rule, crawling the URL corresponding to the crawling depth, thereby generating a new URL list of the page to be tested, and then repeatedly executing steps S102 and S103. After checking all the URLs in the URL list of the pages to be tested, a final checking result is obtained.

本发明还提供一种页面状态的监控装置,所述装置包括:The present invention also provides a device for monitoring page status, the device comprising:

配置信息模块,用于获取外部配置文件,所述外部配置文件包括待测页面的URL列表,其中,所述外部配置文件中的配置信息包括所述待测页面的URL、爬取子页面URL规则、控件属性规则、检查规则、JavaScript开关、客户端信息以及爬取深度;The configuration information module is used to obtain an external configuration file, the external configuration file includes a URL list of the page to be tested, wherein the configuration information in the external configuration file includes the URL of the page to be tested, crawling subpage URL rules , control attribute rules, inspection rules, JavaScript switches, client information and crawling depth;

URL解析模块,用于读取所述待测页面的URL列表中的URL,并且针对所述URL,通过所述客户端信息以及所述JavaScript开关,生成所述URL对应页面的文档对象模型信息;A URL parsing module, configured to read the URL in the URL list of the page to be tested, and for the URL, generate the document object model information of the page corresponding to the URL through the client information and the JavaScript switch;

页面检查模块,用于按照所述爬取子页面URL规则、所述控件属性规则以及所述检查规则,对与所述URL对应的页面的文档对象模型信息进行检查,生成检查结果;A page checking module, configured to check the document object model information of the page corresponding to the URL according to the crawling subpage URL rules, the control attribute rules and the checking rules, and generate checking results;

页面监控模块,用于如果所述待测页面的爬取深度大于1,则根据所述爬取子页面URL规则,爬取与所述爬取深度对应的URL,从而生成新的待测页面的URL列表,然后重复执行所述URL解析模块和所述页面检查模块,待检查完毕全部所述待测页面的URL列表中的URL,得到最终检查结果。Page monitoring module, for if the crawling depth of the page to be tested is greater than 1, crawl the URL corresponding to the crawling depth according to the crawling subpage URL rule, thereby generating a new page to be tested URL list, and then repeatedly execute the URL parsing module and the page checking module, until the URLs in the URL list of all the pages to be tested are checked, and the final checking result is obtained.

可选地,所述爬取子页面URL规则为设置检测域名信息或特定URL;Optionally, the crawling subpage URL rule is set to detect domain name information or a specific URL;

所述控件属性规则为获取待测页面的控件信息;The control attribute rule is to obtain the control information of the page to be tested;

所述检查规则为获取所述待测页面出现的问题提示信息;The inspection rule is to obtain prompt information about problems that appear on the page to be tested;

所述JavaScript开关用于判断检查页面是否需要执行JavaScript脚本;The JavaScript switch is used to judge whether the checking page needs to execute the JavaScript script;

所述客户端信息用于模拟访问页面的客户端属性;The client information is used to simulate the client attributes of the accessed page;

所述爬取深度用于设置检查页面的层级。The crawling depth is used to set the level of inspection pages.

可选地,还包括:页面判断模块,用于在读取所述URL列表中的URL时,判断所述待测页面是否需要执行JavaScript,按照所述客户端信息通过页面请求模拟工具执行所述待测页面。Optionally, it also includes: a page judging module, configured to judge whether the page to be tested needs to execute JavaScript when reading the URL in the URL list, and execute the JavaScript through a page request simulation tool according to the client information. The page to be tested.

可选地,所述第一页面检查模块还包括:Optionally, the first page checking module also includes:

URL获取模块,用于通过使用正则表达式,从所述待测页面的文档对象模型信息中获取和保存符合所述爬取子页面URL规则的所述URL;A URL acquisition module, configured to obtain and save the URL that meets the crawling subpage URL rule from the document object model information of the page to be tested by using a regular expression;

CSS内敛模块,用于将所述文档对象模型信息中的CSS内容内敛到所述文档对象模型信息中,生成内敛的文档对象模型信息;A CSS restraint module, configured to restrain CSS content in the document object model information into the document object model information, and generate restrained document object model information;

DOM检查模块,用于根据所述控件属性规则以及所述检查规则,对所述内敛的文档对象模型信息使用正则表达式进行检查,生成所述检查结果。The DOM checking module is configured to check the introverted document object model information using regular expressions according to the control attribute rules and the checking rules, and generate the checking result.

可选地,所述装置还包括:数据处理模块,用于将所述最终检查结果以及将所述问题提示信息保存至数据库中,并将所述检测结果以及将所述问题页面的提示信息通过邮件告知页面维护人员。Optionally, the device further includes: a data processing module, configured to save the final inspection result and the question prompt information in a database, and pass the test result and the prompt information of the question page through Email the page maintainer.

可选地,所述外部配置文件中的配置信息还包括:爬取深度,用于设置检查页面的层级。Optionally, the configuration information in the external configuration file further includes: crawling depth, which is used to set the level of the inspection page.

可选地,所述装置还包括:页面监控模块,用于根据所述爬取子页面URL规则,爬取与所述爬取深度对应的URL,从而生成新的待测页面的URL列表,然后重复执行步骤S102和步骤S103,待检查完毕全部所述待测页面的URL列表中的URL,得到最终检查结果。Optionally, the device further includes: a page monitoring module, configured to crawl the URL corresponding to the crawling depth according to the crawling subpage URL rule, so as to generate a new URL list of the page to be tested, and then Step S102 and Step S103 are repeatedly executed until all the URLs in the URL list of the pages to be tested are checked to obtain the final checking result.

通过本发明提供的页面状态的监控方法及装置,能够按照用户需要检查特定的页面还是所有页面并且及时发现待测页面问题,例如页面过期、链接错误、内容无效、访问出现错误等。另外,页面维护人员可以按照需求对页面的检查深度进行设定以及是否开启对待测页面加载JavaScript等,并将监控结果写到数据库中,方便后期查询。Through the monitoring method and device of the page state provided by the present invention, it is possible to check a specific page or all pages according to the needs of the user, and timely find problems of the page to be tested, such as page expiration, link error, invalid content, access error, etc. In addition, the page maintenance personnel can set the inspection depth of the page and whether to enable JavaScript loading on the page to be tested according to the requirements, and write the monitoring results to the database for later query.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort. In the attached picture:

图1为本发明一实施例的页面状态的监控方法的流程示意图;FIG. 1 is a schematic flow diagram of a method for monitoring page status according to an embodiment of the present invention;

图2为本发明一实施例的页面状态的监控装置示意图。FIG. 2 is a schematic diagram of a device for monitoring page status according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚明白,下面结合附图对本发明实施例做进一步详细说明。在此,本发明的示意性实施例及其说明用于解释本发明,但并不作为对本发明的限定。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. Here, the exemplary embodiments and descriptions of the present invention are used to explain the present invention, but not to limit the present invention.

本领域技术技术人员知道,本发明的实施方式可以实现为一种系统、装置、设备、方法或计算机程序产品。因此,本公开可以具体实现为以下形式,即:完全的硬件、完全的软件(包括固件、驻留软件、微代码等),或者硬件和软件结合的形式。Those skilled in the art know that the embodiments of the present invention can be implemented as a system, device, device, method or computer program product. Therefore, the present disclosure may be embodied in the form of complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.

在本文中,需要理解的是,所涉及的术语中:In this article, it is to be understood that among the terms involved:

URL:统一资源定位符(Uniform Resource Locator)是对可以从互联网上得到的资源的位置和访问方法的一种简洁的表示,是互联网上标准资源的地址。互联网上的每个文件都有一个唯一的URL,它包含的信息指出文件的位置以及浏览器应该怎么处理它;URL: Uniform Resource Locator (Uniform Resource Locator) is a concise representation of the location and access methods of resources that can be obtained from the Internet, and is the address of standard resources on the Internet. Every file on the Internet has a unique URL, which contains information indicating where the file is located and what the browser should do with it;

JavaScript:一种在网页中使用的脚本语言,常用来为网页添加各式各样的动态功能,为用户提供更流畅美观的浏览效果。通常JavaScript脚本是通过嵌入在HTML中来实现自身的功能的;JavaScript: A scripting language used in web pages, which is often used to add various dynamic functions to web pages to provide users with a smoother and more beautiful browsing effect. Usually JavaScript scripts realize their own functions by embedding them in HTML;

Cookie:一种保存在本地浏览器的信息,利用它可以直接和网页服务端通信,直接获取用户信息;Cookie: A kind of information stored in the local browser, which can be used to directly communicate with the web server and directly obtain user information;

user-agent:http请求头的一部分,用来告知服务端当前客户端的信息;user-agent: part of the http request header, used to inform the server of the current client information;

DOM:文档对象模型(DocumentObjectModel),用来描述页面代码信息的模型;DOM: Document Object Model (DocumentObjectModel), a model used to describe page code information;

CSS:网页式样外部文件,用来统一管理页面式样信息。CSS: An external file of web page style, which is used to manage page style information in a unified manner.

示例性方法exemplary method

下面对本发明示例性的实施方式的自助还车方法进行介绍。需要注意的是,本发明的实施方式在此方面不受任何限制。相反,本发明的实施方式可以应用于适用的任何场景。The self-service car return method in an exemplary embodiment of the present invention will be introduced below. It should be noted that the embodiments of the present invention are not limited in this regard. On the contrary, the embodiments of the present invention can be applied to any applicable scene.

例如,参见图1所示,为本发明一实施例的页面状态的监控方法的流程示意图。For example, refer to FIG. 1 , which is a schematic flowchart of a method for monitoring a page status according to an embodiment of the present invention.

如图所示,该方法包括:As shown, the method includes:

步骤S101:获取外部配置文件,所述外部配置文件包括待测页面的URL列表,其中,所述外部配置文件中的配置信息包括所述待测页面的URL、爬取子页面URL规则、控件属性规则、检查规则、JavaScript开关以及客户端信息;Step S101: Obtain an external configuration file, the external configuration file includes a URL list of the page to be tested, wherein the configuration information in the external configuration file includes the URL of the page to be tested, crawling subpage URL rules, and control attributes Rules, inspection rules, JavaScript switches, and client information;

步骤S102:读取所述待测页面的URL列表中的URL,并且针对所述URL,通过所述客户端信息以及所述JavaScript开关,生成与所述URL对应的页面的文档对象模型信息;Step S102: read the URL in the URL list of the page to be tested, and generate document object model information of the page corresponding to the URL for the URL through the client information and the JavaScript switch;

步骤S103:按照所述爬取子页面URL规则、所述控件属性规则以及所述检查规则,对与所述URL对应的页面的文档对象模型信息进行检查,生成检查结果。Step S103: Check the document object model information of the page corresponding to the URL according to the crawling subpage URL rule, the control attribute rule, and the checking rule, and generate a checking result.

可选地,所述爬取子页面URL规则为设置检测域名信息或特定URL;Optionally, the crawling subpage URL rule is set to detect domain name information or a specific URL;

所述控件属性规则为获取待测页面的控件信息;The control attribute rule is to obtain the control information of the page to be tested;

所述检查规则为获取所述待测页面出现的问题提示信息;The inspection rule is to obtain prompt information about problems that appear on the page to be tested;

所述JavaScript开关用于判断检查页面是否需要执行JavaScript脚本;The JavaScript switch is used to judge whether the checking page needs to execute the JavaScript script;

所述客户端信息用于模拟访问页面的客户端属性。The client information is used to simulate client attributes of the accessed page.

可选地,所述步骤S102还包括:在读取所述URL列表中的URL时,判断所述待测页面是否需要执行JavaScript,按照所述客户端信息通过页面请求模拟工具执行所述待测页面。Optionally, the step S102 further includes: when reading the URL in the URL list, judging whether the page to be tested needs to execute JavaScript, and executing the page to be tested through a page request simulation tool according to the client information page.

可选地,所述步骤S103包括:通过使用正则表达式,从所述待测页面的文档对象模型信息中获取和保存符合所述爬取子页面URL规则的所述URL;Optionally, the step S103 includes: acquiring and saving the URL conforming to the crawling subpage URL rule from the document object model information of the page to be tested by using a regular expression;

将所述文档对象模型信息中的CSS内容内敛到所述文档对象模型信息中,生成内敛的文档对象模型信息;Introducing CSS content in the document object model information into the document object model information to generate introverted document object model information;

根据所述控件属性规则以及所述检查规则,对所述内敛的文档对象模型信息使用正则表达式进行检查,生成所述检查结果。According to the control attribute rules and the checking rules, the restrained document object model information is checked using a regular expression, and the checking result is generated.

可选地,还包括:将所述最终检查结果以及将所述问题提示信息保存至数据库中,并将所述检测结果以及将所述问题页面的提示信息通过邮件告知页面维护人员。Optionally, the method further includes: saving the final inspection result and the problem prompt information in a database, and notifying the page maintainer of the detection result and the prompt information of the problem page by email.

实施例Example

下面结合一个具体实施例对本发明进行具体描述,然而值得注意的是该具体实施例仅是为了更好地描述本发明,并不构成对本发明的不当限定。The present invention will be specifically described below in conjunction with a specific embodiment, but it should be noted that this specific embodiment is only for better describing the present invention, and does not constitute an improper limitation of the present invention.

步骤S101:获取外部配置文件,所述外部配置文件包括待测页面的URL列表,其中,所述外部配置文件中的配置信息包括所述待测页面的URL、爬取子页面URL规则、控件属性规则、检查规则、JavaScript开关以及客户端信息;Step S101: Obtain an external configuration file, the external configuration file includes a URL list of the page to be tested, wherein the configuration information in the external configuration file includes the URL of the page to be tested, crawling subpage URL rules, and control attributes Rules, inspection rules, JavaScript switches, and client information;

具体来说,用户可以预先配置外部配置文件中的配置信息,配置得到待测页面的URL列表。根据用户需求,配置信息可以包括:例如,爬取页面的爬取子页面URL规则、控件属性规则,检查规则,JavaScript开关、客户端信息、爬取深度以及是否写入数据库等。Specifically, the user can pre-configure the configuration information in the external configuration file, and configure to obtain the URL list of the pages to be tested. According to user requirements, the configuration information may include: for example, crawling subpage URL rules, control attribute rules, inspection rules, JavaScript switches, client information, crawling depth, and whether to write to the database.

下面对配置信息进行详细介绍:The configuration information is described in detail below:

(1)用户可以定义URL以作为本发明一具体实施例中的检查对象,待测页面的URL可以配置在外部配置文件中的URL列表中,例如http://sale.jd.com/app/test.html(1) The user can define URL as the inspection object in a specific embodiment of the present invention, and the URL of the page to be tested can be configured in the URL list in the external configuration file, such as http://sale.jd.com/app /test.html .

(2)爬取子页面URL规则,爬取子页面URL规则可以是匹配域名信息;(2) Crawling subpage URL rules, crawling subpage URL rules can match domain name information;

例如,配置http://sale.jd.com/,则根据该爬取子页面URL规则会在页面中查找所有的以http://sale.jd.com/开头的URL链接。For example, if http://sale.jd.com/ is configured, all URL links starting with http://sale.jd.com/ will be searched in the page according to the crawl subpage URL rule.

另外,该爬取子页面URL规则也可以是匹配特定URL;In addition, the crawling subpage URL rule can also match a specific URL;

例如,如果配置http://sale.jd.com/abc.html等,则根据该爬取子页面URL规则仅会执行http://sale.jd.com/abc.html这个URL链接。For example, if you configure http://sale.jd.com/abc.html , etc., only the URL link http://sale.jd.com/abc.html will be executed according to the crawl subpage URL rule.

(3)控件属性规则,由于页面控件是每个网页的脚本元素,如<div>表示一块区域,<table>表示表格,每个控件通过id属性来区分,比如<tableid=1><tableid=2>分别表示两个不同的表格。同时每个控件都有其他属性,比如长度,可见度等。因此用户可以设置检查某个控件的可见属性是否是可见的,则根据该控件属性规则,程序会在页面中查找到该控件并判断该控件的可见属性是否和用户要求一致,若不一致则会捕捉到该问题。通过配置页面控件信息,可以实现步骤S104中检查页面式样的目的。(3) Control attribute rules, because the page control is the script element of each web page, such as <div> represents a region, <table> represents a table, and each control is distinguished by the id attribute, such as <tableid=1><tableid= 2> represent two different tables respectively. At the same time, each control has other properties, such as length, visibility, etc. Therefore, the user can set to check whether the visible property of a control is visible, then according to the control property rules, the program will find the control on the page and judge whether the visible property of the control is consistent with the user's requirements, and if not, it will capture to the question. By configuring the page control information, the purpose of checking the page style in step S104 can be achieved.

(4)检查规则,用户设置问题类型来检查页面是否有问题,例如问题类型可以体现为“出现404错误”、“活动已经过期”或者“页面不存在”等,那么如果页面展示“活动已经过期”这类过期信息,则根据该检查规则,程序会捕捉到待测页面的问题提示信息。通过配置问题提示信息,可以实现步骤S104中检查页面错误信息的目的。(4) Check the rules. The user sets the question type to check whether there is a problem with the page. For example, the question type can be reflected as "a 404 error occurred", "the event has expired" or "the page does not exist", etc., then if the page displays "the event has expired ", according to the inspection rule, the program will capture the problem prompt information of the page to be tested. By configuring the problem prompt information, the purpose of checking page error information in step S104 can be achieved.

(5)JavaScript开关,即JavaScript开启开关(JavaScriptEnabled),用户可以通过JavaScript开关设置待测页面是否需要执行页面内的JavaScript脚本。执行JavaScript脚本的页面会在最终的页面内容中看到动态生成的内容,而若不执行JavaScript脚本则不会(前提是该待测页面有动态生成的内容)。通过配置JavaScript开关,可以实现步骤S103中产生的文档对象模型信息(DOM)是否是执行了JavaScript脚本后生成的文档对象模型信息(DOM)。(5) JavaScript switch, that is, a JavaScript enable switch (JavaScript Enabled), through which the user can set whether the page to be tested needs to execute the JavaScript script in the page. The page that executes the JavaScript script will see the dynamically generated content in the final page content, but will not see it if the JavaScript script is not executed (provided that the page to be tested has dynamically generated content). By configuring the JavaScript switch, it can be realized whether the document object model information (DOM) generated in step S103 is the document object model information (DOM) generated after executing the JavaScript script.

(6)客户端信息(user-agent),用户可以设置客户端信息(user-agent)用于模拟访问页面的客户端信息,例如设置iPhone,则会模拟在iPhone手机端的浏览器上去访问页面;例如设置PC,则会模拟在电脑上通过浏览器访问页面。通过配置客户端信息(user-agent),可以实现步骤S103中模拟客户端目的。(6) Client information (user-agent), the user can set the client information (user-agent) to simulate the client information of the access page, for example, if the iPhone is set, it will simulate the browser on the iPhone mobile phone to access the page; For example, if you set up a PC, it will simulate accessing the page through a browser on the computer. By configuring the client information (user-agent), the purpose of simulating the client in step S103 can be achieved.

(7)数据库开关(DBSwitch),用户可以设置开启数据库开关以便将捕获的页面提示信息写入到数据库中。另外,数据库配置需要添加数据库地址,数据库表名,访问数据库的账户和密码信息。通过配置数据库开关,可以实现步骤S105中将最终检查结果写入数据库的目的。(7) Database switch (DBSwitch), the user can set and enable the database switch so as to write the captured page prompt information into the database. In addition, the database configuration needs to add the database address, database table name, account and password information to access the database. By configuring the database switch, the purpose of writing the final inspection result into the database in step S105 can be achieved.

在本发明一实施例中,配置信息还可以包括:爬取深度(Depth),用户可以设置爬取深度让监控程序做递归检查。例如,如果设置爬取深度为1层,则监控程序不会去爬取待测页面中的链接,而检查完入口的页面就结束了。但是,如果设置爬取深度为2层,则监控程序会去爬取用户定义的链接规则,并将爬取的这些链接保存下来。在入口页面全部检查完毕之后,再去检查刚才保存下来的链接,以此类推。通过配置爬取深度可以实现步骤S104中递归检查的目的。In an embodiment of the present invention, the configuration information may further include: crawling depth (Depth), and the user may set the crawling depth for the monitoring program to perform recursive inspection. For example, if the crawling depth is set to 1 layer, the monitoring program will not crawl the links in the page to be tested, and the page that checks the entry will end. However, if the crawling depth is set to 2 layers, the monitoring program will crawl the user-defined link rules and save the crawled links. After checking all the entry pages, check the link you just saved, and so on. The purpose of the recursive inspection in step S104 can be achieved by configuring the crawling depth.

步骤S102:读取所述待测页面的URL列表中的URL,并且针对所述URL,通过所述客户端信息以及所述JavaScript开关,生成与所述URL对应的页面的文档对象模型信息。Step S102: Read the URL in the URL list of the page to be tested, and generate document object model information of the page corresponding to the URL for the URL through the client information and the JavaScript switch.

值得一提的是,在读取所述URL列表中的URL时,还需要判断所述待测页面是否需要执行JavaScript,按照所述客户端信息通过页面请求模拟工具执行所述待测页面。It is worth mentioning that when reading the URLs in the URL list, it is also necessary to determine whether the page to be tested needs to execute JavaScript, and execute the page to be tested through the page request simulation tool according to the client information.

具体来说,通过读取设置JavaScriptEnabled,并且若JavaScriptEnabled为true,则开启javascript执行。Specifically, JavaScriptEnabled is set by reading, and if JavaScriptEnabled is true, javascript execution is enabled.

例如,某一待测页面的URL为http://testabc.com/index.html,其中,html内容如下:For example, the URL of a page to be tested is http://testabc.com/index.html , where the html content is as follows:

<html><html>

<script><script>

document.write(“你好”);document.write("Hello");

</script></script>

</html></html>

如果通过浏览器执行该待测页面,则在页面上会显示“你好”。同样,若执行了JavaScript,我们能在生成该待测页面的文档对象模型信息(DOM)中看到“你好”;若未执行JavaScript,我们将看不到“你好”这个关键词。If the page to be tested is executed through a browser, "Hello" will be displayed on the page. Similarly, if JavaScript is executed, we can see "hello" in the document object model information (DOM) of the page to be tested; if JavaScript is not executed, we will not see the keyword "hello".

另外,通过一般的http请求工具,若请求的待测页面为静态页面,可以返回的文档对象模型信息(DOM),但若请求的待测页面为动态页面,则返回的文档对象模型信息(DOM)中则不会包含动态生成的内容。In addition, through the general http request tool, if the requested page to be tested is a static page, the document object model information (DOM) can be returned, but if the requested page to be tested is a dynamic page, the returned document object model information (DOM ) will not include dynamically generated content.

在本发明一实施例中,通过phantomjs工具去请求待测页面,在开启了JavaScript开关(允许执行JavaScript)之后,我们都能够获取无论是动态还是静态的页面的文档对象模型信息(DOM),以便于后期分析。In an embodiment of the present invention, go to request the page to be tested by phantomjs tool, after opening the JavaScript switch (permission to execute JavaScript), we can all obtain the document object model information (DOM) of no matter be dynamic or static page, so that analyzed later.

在现有技术中,通常模拟用户客户端主要是通过修改http请求头的方式,操作复杂,同时容易输错。In the prior art, the user client is usually simulated mainly by modifying the http request header, which is complicated to operate and easy to input wrongly.

由于user-agent可以模拟不同的用户客户端情况,因此在本发明中可以通过设置cookie信息就可以模拟用户登录的场景,而设置user-agent中的值就可以实现模拟用户客户端信息的目的,例如设置<user-agent>iPhone的配置信息来告知监控访问页面模拟的是iPhone移动终端。Since user-agent can simulate different user client situations, the scene of user login can be simulated by setting cookie information in the present invention, and the purpose of simulating user client information can be realized by setting the value in user-agent. For example, set the configuration information of <user-agent> iPhone to inform the monitoring and access page to simulate the iPhone mobile terminal.

以http://www.163.com为例,网站通过这个值来返回不同的页面内容。在实际测试场景中,我们需要查看具体客户端的输出,如果想知道用户通过iPhone移动终端访问http://www.163.com是否可以看到体育这个版块。那么通过设置user-agent的值,就可以模拟iPhone用户。Take http://www.163.com as an example, the website uses this value to return different page content. In the actual test scenario, we need to check the output of the specific client. If we want to know whether the user can see the sports section when visiting http://www.163.com through the iPhone mobile terminal. Then by setting the value of user-agent, you can simulate an iPhone user.

最后,通过所述客户端信息以及所述JavaScript开关,就可以生成待测页面的文档对象模型信息(DOM),同时将该文档对象模型信息(DOM)保存下来做进一步分析。Finally, through the client information and the JavaScript switch, the document object model information (DOM) of the page to be tested can be generated, and the document object model information (DOM) can be saved for further analysis.

步骤S103:按照所述爬取子页面URL规则、所述控件属性规则以及所述检查规则,对与所述URL对应的页面的文档对象模型信息进行检查,生成检查结果。Step S103: Check the document object model information of the page corresponding to the URL according to the crawling subpage URL rule, the control attribute rule, and the checking rule, and generate a checking result.

首先,通过使用正则表达式,从所述待测页面的文档对象模型信息(DOM)中获取和保存符合所述爬取子页面URL规则的所述URL;本发明利用页面匹配的正则表达式算法,可以过滤出要分析的页面问题描述、子活动链接、特定控件的属性、图片链接等。First, by using regular expressions, obtain and save the URL that meets the crawling subpage URL rules from the Document Object Model information (DOM) of the page to be tested; the present invention utilizes the regular expression algorithm of page matching , you can filter out page problem descriptions, sub-activity links, properties of specific controls, image links, etc. to be analyzed.

其次,通过解析内敛前的文档对象模型信息(DOM),将所述URL的CSS内容内敛到文档对象模型信息(DOM)中,生成内敛的文档对象模型信息。通过该文档对象模型信息(DOM),我们可以直接查看特定控件的式样属性。Secondly, by analyzing the document object model information (DOM) before introversion, the CSS content of the URL is introverted into the document object model information (DOM) to generate introverted document object model information. Through this Document Object Model information (DOM), we can directly view the style properties of a specific control.

例如,在对文档对象模型信息(DOM)内敛前,查看某个控件显示属性方式得通过CSS文件中标记的查看,而内敛后,我们可以直接查看控件属性,如:<span display:block>你好</span>。For example, before the document object model information (DOM) is introverted, the way to check the display properties of a certain control is to check the mark in the CSS file. After introversion, we can directly view the control properties, such as: <span display:block> you Good</span>.

最后,根据所述控件属性规则以及所述检查规则,对该内敛的文档对象模型信息(DOM)使用正则表达式进行检查,生成检查结果。Finally, according to the control attribute rules and the checking rules, the regular expression is used to check the introverted Document Object Model information (DOM), and a checking result is generated.

具体来说,通过用户设置的检查规则,使用正则表达式,检查该内敛的文档对象模型信息(DOM)中是否包含问题描述内容(例如,“出现404错误”、“活动已经过期”或者“页面不存在”等),若包含问题描述内容则将问题描述内容记录下来。例如,如果外部配置文件中的配置检查页面过期,则若待测页面中包含页面过期等关键词,则将该活动页URL链接的过期问题记录下来。Specifically, through the inspection rules set by the user, use regular expressions to check whether the introverted Document Object Model information (DOM) contains problem description content (for example, "404 error occurred", "activity has expired" or "page Does not exist", etc.), if the problem description content is included, record the problem description content. For example, if the configuration check page in the external configuration file expires, if the page to be tested contains keywords such as page expiration, the expiration problem of the URL link of the active page is recorded.

通过外部配置文件中配置信息中的控件属性检查规则,通过正则表达式,在内敛了CSS后的DOM树结构中匹配需要检查的控件属性。例如:若需要检查某个链接(<a>标签)的属性是不可见的,则若匹配到该标签,同时它的display属性是可见的,则将该错误问题记录下来Through the control attribute inspection rules in the configuration information in the external configuration file, through regular expressions, match the control attributes that need to be checked in the DOM tree structure after the CSS is introverted. For example: if you need to check that the attribute of a certain link (<a> tag) is invisible, if the tag is matched and its display attribute is visible, record the error problem

步骤S104:如果所述待测页面的爬取深度大于1,则根据所述爬取子页面URL规则,爬取与所述爬取深度对应的URL,从而生成新的待测页面的URL列表,然后重复执行步骤S102和步骤S103,待检查完毕全部所述待测页面的URL列表中的URL,得到最终检查结果。Step S104: If the crawling depth of the page to be tested is greater than 1, crawl the URL corresponding to the crawling depth according to the crawling subpage URL rule, thereby generating a new URL list of the page to be tested, Then step S102 and step S103 are repeatedly executed until all the URLs in the URL list of the pages to be tested are checked to obtain the final checking result.

若所有URL检查完毕,则可通过外部配置的数据库信息将问题分类地记录到数据库中。同时,通过外部配置的邮件责任人接收列表,将问题已邮件形式发送到责任人。If all the URLs are checked, the problems can be classified and recorded in the database through the externally configured database information. At the same time, the problem is sent to the responsible person by email through the externally configured email responsible person receiving list.

如图2所示为本发明一实施例的页面状态的监控装置的示意图。该装置2包括:FIG. 2 is a schematic diagram of a device for monitoring page status according to an embodiment of the present invention. The device 2 includes:

配置信息模块21,用于获取外部配置文件,所述外部配置文件包括待测页面的URL列表,其中,所述外部配置文件中的配置信息包括所述待测页面的URL、爬取子页面URL规则、控件属性规则、检查规则、JavaScript开关、客户端信息以及爬取深度;Configuration information module 21, is used for obtaining external configuration file, and described external configuration file comprises the URL list of page to be tested, and wherein, the configuration information in the described external configuration file comprises the URL of described page to be tested, crawls sub-page URL Rules, control attribute rules, inspection rules, JavaScript switches, client information and crawling depth;

URL解析模块22,用于读取所述待测页面的URL列表中的URL,并且针对所述URL,通过所述客户端信息以及所述JavaScript开关,生成所述URL对应页面的文档对象模型信息;The URL parsing module 22 is configured to read the URL in the URL list of the page to be tested, and for the URL, generate the document object model information of the page corresponding to the URL through the client information and the JavaScript switch ;

页面检查模块23,用于按照所述爬取子页面URL规则、所述控件属性规则以及所述检查规则,对与所述URL对应的页面的文档对象模型信息进行检查,生成检查结果。The page checking module 23 is configured to check the document object model information of the page corresponding to the URL according to the crawling subpage URL rule, the control attribute rule and the checking rule, and generate a checking result.

可选地,所述爬取子页面URL规则为设置检测域名信息或特定URL;Optionally, the crawling subpage URL rule is set to detect domain name information or a specific URL;

所述控件属性规则为获取待测页面的控件信息;The control attribute rule is to obtain the control information of the page to be tested;

所述检查规则为获取所述待测页面出现的问题提示信息;The inspection rule is to obtain prompt information about problems that appear on the page to be tested;

所述JavaScript开关用于判断检查页面是否需要执行JavaScript脚本;The JavaScript switch is used to judge whether the checking page needs to execute the JavaScript script;

所述客户端信息用于模拟访问页面的客户端属性。The client information is used to simulate client attributes of the accessed page.

可选地,还包括:页面判断模块24,用于在读取所述URL列表中的URL时,判断所述待测页面是否需要执行JavaScript,按照所述客户端信息通过页面请求模拟工具执行所述待测页面。Optionally, it also includes: a page judging module 24, which is used to judge whether the page to be tested needs to execute JavaScript when reading the URL in the URL list, and execute the JavaScript by the page request simulation tool according to the client information. Describe the page to be tested.

可选地,所述页面检查模块23还包括:Optionally, the page checking module 23 also includes:

URL获取模块231,用于通过使用正则表达式,从所述待测页面的文档对象模型信息中获取和保存符合所述爬取子页面URL规则的所述URL;The URL obtaining module 231 is used to obtain and save the URL that conforms to the crawling subpage URL rule from the document object model information of the page to be tested by using a regular expression;

CSS内敛模块232,用于将所述文档对象模型信息中的CSS内容内敛到所述文档对象模型信息中,生成内敛的文档对象模型信息;A CSS restraint module 232, configured to restrain the CSS content in the document object model information into the document object model information, and generate restrained document object model information;

DOM检查模块233,用于根据所述控件属性规则以及所述检查规则,对所述内敛的文档对象模型信息使用正则表达式进行检查,生成所述检查结果。The DOM checking module 233 is configured to check the introverted document object model information using a regular expression according to the control attribute rule and the checking rule, and generate the checking result.

可选地,该装置还包括:数据处理模块25,用于将所述最终检查结果以及将所述问题提示信息保存至数据库中,并将所述检测结果以及将所述问题页面的提示信息通过邮件告知页面维护人员。Optionally, the device further includes: a data processing module 25, configured to save the final inspection result and the question prompt information in a database, and pass the test result and the prompt information of the question page through Email the page maintainer.

可选地,所述外部配置文件中的配置信息还包括:爬取深度,用于设置检查页面的层级。Optionally, the configuration information in the external configuration file further includes: crawling depth, which is used to set the level of the inspection page.

可选地,该装置还包括:页面监控模块26,用于如果所述待测页面的爬取深度大于1,则根据所述爬取子页面URL规则,爬取与所述爬取深度对应的URL,从而生成新的待测页面的URL列表,然后重复执行所述URL解析模块和所述页面检查模块,待检查完毕全部所述待测页面的URL列表中的URL,得到最终检查结果。Optionally, the device also includes: a page monitoring module 26, configured to crawl the page corresponding to the crawling depth according to the crawling subpage URL rule if the crawling depth of the page to be tested is greater than 1. URL, thereby generating a new URL list of the page to be tested, then repeatedly executing the URL parsing module and the page inspection module, until all URLs in the URL list of the page to be tested are checked, to obtain the final inspection result.

由于本发明提供的页面状态的监控装置是上述方法对应的装置,故不在此赘述。Since the device for monitoring the page status provided by the present invention is a device corresponding to the above method, it will not be described in detail here.

通过本发明提供的页面状态的监控方法及装置,能够按照用户需要检查特定的页面还是所有页面并且及时发现待测页面问题,例如页面过期、链接错误、内容无效、访问出现错误等。另外,页面维护人员可以按照需求对页面的检查深度进行设定以及是否开启对待测页面加载JavaScript等,并将监控结果写到数据库中,方便后期查询。Through the monitoring method and device of the page state provided by the present invention, it is possible to check a specific page or all pages according to the needs of the user, and timely find problems of the page to be tested, such as page expiration, link error, invalid content, access error, etc. In addition, the page maintenance personnel can set the inspection depth of the page and whether to enable JavaScript loading on the page to be tested according to the requirements, and write the monitoring results to the database for later query.

此外,尽管在附图中以特定顺序描述了本发明方法的操作,但是,这并非要求或者暗示必须执行全部所示的操作才能实现期望的结果。附加地或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。Additionally, while operations of the methods of the present invention are depicted in the figures in a particular order, this does not require or imply that all illustrated operations must be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution.

以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The specific embodiments described above have further described the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. Protection scope, within the spirit and principles of the present invention, any modification, equivalent replacement, improvement, etc., shall be included in the protection scope of the present invention.

Claims (14)

1.一种页面状态的监控方法,其特征在于,所述方法包括:1. A monitoring method of page state, is characterized in that, described method comprises: 步骤S101:获取外部配置文件,所述外部配置文件包括待测页面的URL列表,其中,所述外部配置文件中的配置信息包括所述待测页面的URL、爬取子页面URL规则、控件属性规则、检查规则、JavaScript开关、客户端信息;Step S101: Obtain an external configuration file, the external configuration file includes a URL list of the page to be tested, wherein the configuration information in the external configuration file includes the URL of the page to be tested, crawling subpage URL rules, and control attributes Rules, inspection rules, JavaScript switches, client information; 步骤S102:读取所述待测页面的URL列表中的URL,并且针对所述URL,通过所述客户端信息以及所述JavaScript开关,生成与所述URL对应的页面的文档对象模型信息;Step S102: read the URL in the URL list of the page to be tested, and generate document object model information of the page corresponding to the URL for the URL through the client information and the JavaScript switch; 步骤S103:按照所述爬取子页面URL规则、所述控件属性规则以及所述检查规则,对与所述URL对应的页面的文档对象模型信息进行检查,生成检查结果。Step S103: Check the document object model information of the page corresponding to the URL according to the crawling subpage URL rule, the control attribute rule, and the checking rule, and generate a checking result. 2.根据权利要求1所述的方法,其特征在于,2. The method of claim 1, wherein, 所述爬取子页面URL规则为设置检测域名信息或特定URL;The crawling sub-page URL rule is set to detect domain name information or a specific URL; 所述控件属性规则为获取待测页面的控件信息;The control attribute rule is to obtain the control information of the page to be tested; 所述检查规则为获取所述待测页面出现的问题提示信息;The inspection rule is to obtain prompt information about problems that appear on the page to be tested; 所述JavaScript开关用于判断检查页面是否需要执行JavaScript脚本;The JavaScript switch is used to judge whether the checking page needs to execute the JavaScript script; 所述客户端信息用于模拟访问页面的客户端属性。The client information is used to simulate client attributes of the accessed page. 3.根据权利要求1所述的方法,其特征在于,步骤S102还包括:3. The method according to claim 1, wherein step S102 further comprises: 在读取待测页面的URL列表中的URL时,判断所述待测页面是否需要执行JavaScript,按照所述客户端信息通过页面请求模拟工具执行所述待测页面。When reading the URL in the URL list of the page to be tested, it is judged whether the page to be tested needs to execute JavaScript, and the page to be tested is executed through the page request simulation tool according to the client information. 4.根据权利要求1所述的方法,其特征在于,所述步骤S103包括:4. The method according to claim 1, characterized in that the step S103 comprises: 通过使用正则表达式,从所述待测页面的文档对象模型信息中获取和保存符合所述爬取子页面URL规则的所述URL;Obtaining and storing the URL conforming to the crawling subpage URL rule from the document object model information of the page to be tested by using a regular expression; 将所述文档对象模型信息中的CSS内容内敛到所述文档对象模型信息中,生成内敛的文档对象模型信息;Introducing CSS content in the document object model information into the document object model information to generate introverted document object model information; 根据所述控件属性规则以及所述检查规则,对所述内敛的文档对象模型信息使用正则表达式进行检查,生成所述检查结果。According to the control attribute rules and the checking rules, the restrained document object model information is checked using a regular expression, and the checking result is generated. 5.根据权利要求2所述的方法,其特征在于,还包括:5. The method of claim 2, further comprising: 将所述最终检查结果以及将所述问题提示信息保存至数据库中,并将所述检测结果以及将所述问题页面的提示信息通过邮件告知页面维护人员。The final inspection result and the problem prompt information are saved in the database, and the test result and the prompt information of the problem page are notified to the page maintainer by email. 6.根据权利要求1所述的方法,其特征在于,所述外部配置文件中的配置信息还包括:6. The method according to claim 1, wherein the configuration information in the external configuration file further comprises: 爬取深度,用于设置检查页面的层级。Crawl depth, used to set the level of inspection pages. 7.根据权利要求6所述的方法,其特征在于,所述方法还包括:7. The method according to claim 6, further comprising: 根据所述爬取子页面URL规则,爬取与所述爬取深度对应的URL,从而生成新的待测页面的URL列表,然后重复执行步骤S102和步骤S103,待检查完毕全部所述待测页面的URL列表中的URL,得到最终检查结果。According to the crawling subpage URL rule, crawl the URL corresponding to the crawling depth, thereby generating a new URL list of the page to be tested, and then repeatedly execute steps S102 and S103, and wait for the inspection to be completed. URLs in the URL list of the page to get the final check result. 8.一种页面状态的监控装置,其特征在于,所述装置包括:8. A monitoring device for page status, characterized in that the device comprises: 配置信息模块,用于获取外部配置文件,所述外部配置文件包括待测页面的URL列表,其中,所述外部配置文件中的配置信息包括所述待测页面的URL、爬取子页面URL规则、控件属性规则、检查规则、JavaScript开关、客户端信息以及爬取深度;The configuration information module is used to obtain an external configuration file, the external configuration file includes a URL list of the page to be tested, wherein the configuration information in the external configuration file includes the URL of the page to be tested, crawling subpage URL rules , control attribute rules, inspection rules, JavaScript switches, client information and crawling depth; URL解析模块,用于读取所述待测页面的URL列表中的URL,并且针对所述URL,通过所述客户端信息以及所述JavaScript开关,生成所述URL对应页面的文档对象模型信息;A URL parsing module, configured to read the URL in the URL list of the page to be tested, and for the URL, generate the document object model information of the page corresponding to the URL through the client information and the JavaScript switch; 页面检查模块,用于按照所述爬取子页面URL规则、所述控件属性规则以及所述检查规则,对与所述URL对应的页面的文档对象模型信息进行检查,生成检查结果;A page checking module, configured to check the document object model information of the page corresponding to the URL according to the crawling subpage URL rules, the control attribute rules and the checking rules, and generate checking results; 页面监控模块,用于如果所述待测页面的爬取深度大于1,则根据所述爬取子页面URL规则,爬取与所述爬取深度对应的URL,从而生成新的待测页面的URL列表,然后重复执行所述URL解析模块和所述页面检查模块,待检查完毕全部所述待测页面的URL列表中的URL,得到最终检查结果。Page monitoring module, for if the crawling depth of the page to be tested is greater than 1, crawl the URL corresponding to the crawling depth according to the crawling subpage URL rule, thereby generating a new page to be tested URL list, and then repeatedly execute the URL parsing module and the page checking module, until the URLs in the URL list of all the pages to be tested are checked, and the final checking result is obtained. 9.根据权利要求8所述的装置,其特征在于,9. The device of claim 8, wherein: 所述爬取子页面URL规则为设置检测域名信息或特定URL;The crawling sub-page URL rule is set to detect domain name information or a specific URL; 所述控件属性规则为获取待测页面的控件信息;The control attribute rule is to obtain the control information of the page to be tested; 所述检查规则为获取所述待测页面出现的问题提示信息;The inspection rule is to obtain prompt information about problems that appear on the page to be tested; 所述JavaScript开关用于判断检查页面是否需要执行JavaScript脚本;The JavaScript switch is used to judge whether the checking page needs to execute the JavaScript script; 所述客户端信息用于模拟访问页面的客户端属性。The client information is used to simulate client attributes of the accessed page. 10.根据权利要求8所述的装置,其特征在于,还包括:页面判断模块,用于在待测页面的URL列表中的URL时,判断所述待测页面是否需要执行JavaScript,按照所述客户端信息通过页面请求模拟工具执行所述待测页面。10. The device according to claim 8, further comprising: a page judging module, for judging whether the page to be tested needs to execute JavaScript when the URL in the URL list of the page to be tested is used, according to the The client information executes the page to be tested through a page request simulation tool. 11.根据权利要求8所述的装置,其特征在于,所述页面检查模块还包括:11. The device according to claim 8, wherein the page checking module further comprises: URL获取模块,用于通过使用正则表达式,从所述待测页面的文档对象模型信息中获取和保存符合所述爬取子页面URL规则的所述URL;A URL acquisition module, configured to obtain and save the URL that meets the crawling subpage URL rule from the document object model information of the page to be tested by using a regular expression; CSS内敛模块,用于将所述文档对象模型信息中的CSS内容内敛到所述文档对象模型信息中,生成内敛的文档对象模型信息;A CSS restraint module, configured to restrain CSS content in the document object model information into the document object model information, and generate restrained document object model information; DOM检查模块,用于根据所述控件属性规则以及所述检查规则,对所述内敛的文档对象模型信息使用正则表达式进行检查,生成所述检查结果。The DOM checking module is configured to check the introverted document object model information using regular expressions according to the control attribute rules and the checking rules, and generate the checking result. 12.根据权利要求9所述的装置,其特征在于,还包括:12. The device of claim 9, further comprising: 数据处理模块,用于将所述最终检查结果以及将所述问题提示信息保存至数据库中,并将所述检测结果以及将所述问题页面的提示信息通过邮件告知页面维护人员。The data processing module is used to save the final inspection result and the prompt information of the problem in the database, and notify the page maintenance personnel of the detection result and the prompt information of the problem page by email. 13.根据权利要求9所述的装置,其特征在于,所述外部配置文件中的配置信息还包括:13. The device according to claim 9, wherein the configuration information in the external configuration file further comprises: 爬取深度,用于设置检查页面的层级。Crawl depth, used to set the level of inspection pages. 14.根据权利要求13所述的装置,其特征在于,所述装置还包括:14. The device according to claim 13, further comprising: 页面监控模块,用于根据所述爬取子页面URL规则,爬取与所述爬取深度对应的URL,从而生成新的待测页面的URL列表,然后重复执行步骤S102和步骤S103,待检查完毕全部所述待测页面的URL列表中的URL,得到最终检查结果。The page monitoring module is used to crawl the URL corresponding to the crawling depth according to the crawling sub-page URL rule, thereby generating a new URL list of the page to be tested, and then repeatedly executing steps S102 and S103, to be checked Complete all the URLs in the URL list of the page to be tested to obtain the final inspection result.
CN201610878315.5A 2016-10-08 2016-10-08 Page state monitoring method and device Active CN107918575B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610878315.5A CN107918575B (en) 2016-10-08 2016-10-08 Page state monitoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610878315.5A CN107918575B (en) 2016-10-08 2016-10-08 Page state monitoring method and device

Publications (2)

Publication Number Publication Date
CN107918575A true CN107918575A (en) 2018-04-17
CN107918575B CN107918575B (en) 2021-03-30

Family

ID=61892161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610878315.5A Active CN107918575B (en) 2016-10-08 2016-10-08 Page state monitoring method and device

Country Status (1)

Country Link
CN (1) CN107918575B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109768973A (en) * 2018-12-28 2019-05-17 易票联支付有限公司 A kind of publication monitoring method, system and the device of security bulletin
CN110351162A (en) * 2019-05-30 2019-10-18 平安银行股份有限公司 Method, device, computer equipment and storage medium for monitoring page availability
CN110874427A (en) * 2018-09-03 2020-03-10 菜鸟智能物流控股有限公司 Webpage information crawling method, device and system and electronic equipment
CN111428161A (en) * 2019-01-10 2020-07-17 北京京东尚科信息技术有限公司 Web page processing method and system, computer system and computer readable storage medium
CN111581063A (en) * 2020-06-09 2020-08-25 北京大米未来科技有限公司 Data processing method, readable storage medium and electronic device
CN112559919A (en) * 2020-12-22 2021-03-26 平安银行股份有限公司 Checking method and device for online document uploading, electronic equipment and storage medium
CN107918575B (en) * 2016-10-08 2021-03-30 北京京东尚科信息技术有限公司 Page state monitoring method and device
CN113761431A (en) * 2020-09-24 2021-12-07 北京沃东天骏信息技术有限公司 Method and device for checking integrity of page
CN114443411A (en) * 2020-11-05 2022-05-06 阿里巴巴集团控股有限公司 Method, device and equipment for detecting page rendering abnormity and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020143659A1 (en) * 2001-02-27 2002-10-03 Paula Keezer Rules-based identification of items represented on web pages
US20030229677A1 (en) * 2002-06-06 2003-12-11 International Business Machines Corporation Method and system for determining the availability of in-line resources within requested web pages
CN101989303A (en) * 2010-11-02 2011-03-23 浙江大学 Automatic barrier-free network detection method
CN103810086A (en) * 2012-11-08 2014-05-21 腾讯科技(深圳)有限公司 Method, device and system for processing website causing browser breakdown

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107918575B (en) * 2016-10-08 2021-03-30 北京京东尚科信息技术有限公司 Page state monitoring method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020143659A1 (en) * 2001-02-27 2002-10-03 Paula Keezer Rules-based identification of items represented on web pages
US20030229677A1 (en) * 2002-06-06 2003-12-11 International Business Machines Corporation Method and system for determining the availability of in-line resources within requested web pages
CN101989303A (en) * 2010-11-02 2011-03-23 浙江大学 Automatic barrier-free network detection method
CN103810086A (en) * 2012-11-08 2014-05-21 腾讯科技(深圳)有限公司 Method, device and system for processing website causing browser breakdown

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107918575B (en) * 2016-10-08 2021-03-30 北京京东尚科信息技术有限公司 Page state monitoring method and device
CN110874427A (en) * 2018-09-03 2020-03-10 菜鸟智能物流控股有限公司 Webpage information crawling method, device and system and electronic equipment
CN109768973A (en) * 2018-12-28 2019-05-17 易票联支付有限公司 A kind of publication monitoring method, system and the device of security bulletin
CN111428161A (en) * 2019-01-10 2020-07-17 北京京东尚科信息技术有限公司 Web page processing method and system, computer system and computer readable storage medium
CN110351162A (en) * 2019-05-30 2019-10-18 平安银行股份有限公司 Method, device, computer equipment and storage medium for monitoring page availability
CN111581063A (en) * 2020-06-09 2020-08-25 北京大米未来科技有限公司 Data processing method, readable storage medium and electronic device
CN113761431A (en) * 2020-09-24 2021-12-07 北京沃东天骏信息技术有限公司 Method and device for checking integrity of page
CN114443411A (en) * 2020-11-05 2022-05-06 阿里巴巴集团控股有限公司 Method, device and equipment for detecting page rendering abnormity and storage medium
CN112559919A (en) * 2020-12-22 2021-03-26 平安银行股份有限公司 Checking method and device for online document uploading, electronic equipment and storage medium
CN112559919B (en) * 2020-12-22 2023-11-10 平安银行股份有限公司 Method and device for checking online document uploading, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN107918575B (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN107918575A (en) The monitoring method and device of a kind of page status
US8589790B2 (en) Rule-based validation of websites
US9411782B2 (en) Real time web development testing and reporting system
CN104427627B (en) Test data acquisition methods, client and server
US9262311B1 (en) Network page test system and methods
CN101877696A (en) Equipment and method for reconfiguring false response messages under network application environment
US20180131779A1 (en) Recording And Triggering Web And Native Mobile Application Events With Mapped Data Fields
CN107085549B (en) Method and device for generating fault information
US8407766B1 (en) Method and apparatus for monitoring sensitive data on a computer network
CN108664559A (en) A kind of automatic crawling method of website and webpage source code
CN105868290B (en) Method and device for displaying search results
CN103577526B (en) It is a kind of to verify method, system and browser that whether the page is changed
US20130290898A1 (en) Method for presenting prompt message, terminal and server
US10169037B2 (en) Identifying equivalent JavaScript events
CN108415804A (en) Obtain method, terminal device and the computer readable storage medium of information
WO2022063133A1 (en) Sensitive information detection method and apparatus, and device and computer-readable storage medium
CN105260424A (en) Processing method and apparatus for webpage browsing historical records and most common accesses of user
RU2669172C2 (en) Method and monitoring system of web-site consistency
CN103581321B (en) A kind of creation method of refer chains, device and safety detection method and client
US11829434B2 (en) Method, apparatus and computer program for collecting URL in web page
CN119807266A (en) Data processing method, device, equipment and storage medium
CN118740675A (en) Network supportability testing method, device, equipment, medium and program product
US10289836B1 (en) Webpage integrity monitoring
CN113792243B (en) Web page data processing method and device, storage medium and electronic device
CN104375931B (en) Mobile browser feature detection and result statistics method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant