CN111143650B

CN111143650B - Method, device, medium and electronic device for obtaining page data

Info

Publication number: CN111143650B
Application number: CN201911295167.4A
Authority: CN
Inventors: 王政操; 张霞
Original assignee: Neusoft Corp
Current assignee: Beijing Jinchengxin Technology Consulting Services Co.,Ltd.
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2024-04-26
Anticipated expiration: 2039-12-16
Also published as: CN111143650A

Abstract

The disclosure relates to a method, a device, a medium and an electronic device for acquiring page data, wherein the method comprises the following steps: receiving an original request of page operation; acquiring characteristic parameters of the original request, and storing the characteristic parameters into a target structure; generating a data acquisition request according to the target structure body, wherein the data acquisition request is used for requesting to acquire page data of a response page of the original request; sending the data acquisition request to a server; and receiving the page data of the response page sent by the server. Therefore, when the page data of the response page of the original request is acquired, the data acquisition request is generated according to the characteristic parameters of the original request, so that the data is requested to the server based on the data acquisition request, the page data can be acquired without simulating the request response operation of the browser, the complexity of acquiring the page data can be effectively reduced, and the efficiency and accuracy of acquiring the page data are improved.

Description

Method, device, medium and electronic device for obtaining page data

技术领域Technical Field

本公开涉及计算机技术领域，具体地，涉及一种获取页面数据的方法、装置、介质及电子设备。The present disclosure relates to the field of computer technology, and in particular, to a method, device, medium and electronic device for acquiring page data.

背景技术Background technique

随着计算机技术的发展，对网络数据的需求也越来越多。在某些场景下，用户想要获取某一页面的数据，然而若是该页面未提供数据导出的功能，这种情况下用户是难以获得数据的。为了解决上述问题，现有技术中，通常是基于爬虫技术获取该类页面的页面数据。With the development of computer technology, the demand for network data is increasing. In some scenarios, users want to obtain data from a certain page. However, if the page does not provide a data export function, it is difficult for users to obtain data. In order to solve the above problem, in the prior art, the page data of such pages is usually obtained based on crawler technology.

但是，现有技术中，网络爬虫技术的实现通常是基于无头浏览器模拟页面操作，从而获取页面中的数据。在该过程中，由于是通过模拟用户操作进行的，则需要保证模拟过程中的任何步骤与用户操作一致、均不能出错，否则中间任何一个步骤出错，都会导致整个流程失效，无法获取页面数据。However, in the prior art, the implementation of web crawler technology is usually based on a headless browser to simulate page operations to obtain data in the page. In this process, since it is carried out by simulating user operations, it is necessary to ensure that any step in the simulation process is consistent with the user operation and no errors are made. Otherwise, if any step in the middle makes a mistake, the entire process will fail and the page data cannot be obtained.

发明内容Summary of the invention

本公开的目的是提供一种便捷、准确地获取页面数据的方法、装置、介质及电子设备。The purpose of the present disclosure is to provide a method, device, medium and electronic device for obtaining page data conveniently and accurately.

为了实现上述目的，根据本公开的第一方面，提供一种获取页面数据的方法，所述方法包括：In order to achieve the above objective, according to a first aspect of the present disclosure, a method for acquiring page data is provided, the method comprising:

接收页面操作的原始请求；Receive the original request for page operation;

获取所述原始请求的特征参数，并将所述特征参数存储到目标结构体；Obtaining characteristic parameters of the original request, and storing the characteristic parameters in a target structure;

根据所述目标结构体生成数据获取请求，所述数据获取请求用于请求获取所述原始请求的响应页面的页面数据；Generate a data acquisition request according to the target structure, wherein the data acquisition request is used to request to acquire page data of a response page of the original request;

向服务器发送所述数据获取请求；Sending the data acquisition request to the server;

接收所述服务器发送的所述响应页面的页面数据。Receive page data of the response page sent by the server.

可选地，在所述向服务器所述数据获取请求的步骤之前，所述方法还包括：Optionally, before the step of requesting the server for data acquisition, the method further includes:

若确定在对所述原始请求进行响应之前需要进行登录操作，则获取所述登录操作对应的响应令牌；If it is determined that a login operation is required before responding to the original request, obtaining a response token corresponding to the login operation;

所述根据所述目标结构体生成数据获取请求，包括：The step of generating a data acquisition request according to the target structure includes:

根据所述目标结构体和所述响应令牌，生成数据获取请求。A data acquisition request is generated according to the target structure and the response token.

可选地，所述获取所述登录操作对应的响应令牌，包括：Optionally, obtaining a response token corresponding to the login operation includes:

若存在与所述原始请求对应的登录脚本，则回放所述登录脚本，以在所述登录脚本所使用的登录信息通过认证后获得响应令牌，其中，所述登录脚本是基于登录插件预先录制生成的。If there is a login script corresponding to the original request, the login script is replayed to obtain a response token after the login information used by the login script is authenticated, wherein the login script is pre-recorded and generated based on the login plug-in.

可选地，在回放所述登录脚本的过程中，若当前操作为验证码输入操作，则获取当前页面中的验证码图像，并向验证码识别模块发送所述验证码图像，以由所述验证码识别模块对所述验证码图像进行识别，获取验证码信息；根据接收到的所述验证码识别模块得出的验证码信息，执行所述验证码输入操作。Optionally, during the playback of the login script, if the current operation is a verification code input operation, the verification code image in the current page is obtained, and the verification code image is sent to a verification code recognition module, so that the verification code recognition module recognizes the verification code image and obtains verification code information; the verification code input operation is performed based on the verification code information obtained by the verification code recognition module.

若在对所述原始请求进行响应之前需要进行登录操作，且不存在与所述原始请求对应的登录脚本，输出提示信息，以提示用户进行登录操作；If a login operation is required before responding to the original request, and there is no login script corresponding to the original request, output a prompt message to prompt the user to perform the login operation;

通过登录插件检测登录状态；Detect login status through login plugin;

接收所述登录插件在检测到登录成功的情况下发送的响应令牌。A response token is received from the login plug-in when a successful login is detected.

可选地，在所述接收所述服务器发送的所述响应页面的页面数据的步骤之后，所述方法还包括：Optionally, after the step of receiving the page data of the response page sent by the server, the method further includes:

确定所述响应页面的第一页面文本信息与所述原始请求对应的响应样例的第二页面文本信息对应的文本相似度参数；Determine a text similarity parameter corresponding to the first page text information of the response page and the second page text information of the response sample corresponding to the original request;

根据所述文本相似度参数确定所述响应页面与所述响应样例是否匹配；Determining whether the response page matches the response sample according to the text similarity parameter;

若确定所述响应页面和所述响应样例匹配，存储所述响应页面的页面数据。If it is determined that the response page matches the response sample, the page data of the response page is stored.

可选地，所述存储所述响应页面的页面数据，包括：Optionally, storing the page data of the response page includes:

存储所述响应页面的页面数据至结构化文件中，以基于所述结构化文件存储至数据库。The page data of the response page is stored in a structured file, so as to be stored in a database based on the structured file.

根据本公开的第二方面，提供一种获取页面数据的装置，所述装置包括：According to a second aspect of the present disclosure, a device for acquiring page data is provided, the device comprising:

第一接收模块，用于接收页面操作的原始请求；A first receiving module, used for receiving an original request for a page operation;

第一存储模块，用于获取所述原始请求的特征参数，并将所述特征参数存储到目标结构体；A first storage module, used for obtaining characteristic parameters of the original request and storing the characteristic parameters in a target structure;

生成模块，用于根据所述目标结构体生成数据获取请求，所述数据获取请求用于请求获取所述原始请求的响应页面的页面数据；A generating module, used for generating a data acquisition request according to the target structure, wherein the data acquisition request is used for requesting to acquire page data of a response page of the original request;

发送模块，用于向服务器发送所述数据获取请求；A sending module, used for sending the data acquisition request to the server;

第二接收模块，用于接收所述服务器发送的所述响应页面的页面数据。The second receiving module is used to receive the page data of the response page sent by the server.

可选地，所述装置还包括：Optionally, the device further comprises:

获取模块，用于在所述发送模块向服务器所述数据获取请求之前，若确定在对所述原始请求进行响应之前需要进行登录操作，则获取所述登录操作对应的响应令牌；an acquisition module, configured to acquire a response token corresponding to the login operation if it is determined that a login operation is required before responding to the original request before the sending module sends the data acquisition request to the server;

所述生成模块用于：The generating module is used for:

可选地，所述获取模块用于：Optionally, the acquisition module is used to:

可选地，所述获取模块包括：Optionally, the acquisition module includes:

输出子模块，用于在对所述原始请求进行响应之前需要进行登录操作，且不存在与所述原始请求对应的登录脚本的情况下，输出提示信息，以提示用户进行登录操作；an output submodule, configured to output prompt information to prompt the user to perform a login operation when a login operation is required before responding to the original request and there is no login script corresponding to the original request;

检测子模块，用于通过登录插件检测登录状态；The detection submodule is used to detect the login status through the login plug-in;

接收子模块，用于接收所述登录插件在检测到登录成功的情况下发送的响应令牌。The receiving submodule is used to receive the response token sent by the login plug-in when detecting that the login is successful.

可选地，所述装置还包括：Optionally, the device further comprises:

第一确定模块，用于在第二接收模块接收所述服务器发送的所述响应页面的页面数据之后，确定所述响应页面的第一页面文本信息与所述原始请求对应的响应样例的第二页面文本信息对应的文本相似度参数；A first determining module is used to determine a text similarity parameter corresponding to first page text information of the response page and second page text information of the response sample corresponding to the original request after the second receiving module receives the page data of the response page sent by the server;

第二确定模块，用于根据所述文本相似度参数确定所述响应页面与所述响应样例是否匹配；A second determination module, configured to determine whether the response page matches the response sample according to the text similarity parameter;

第二存储模块，用于在确定所述响应页面和所述响应样例匹配的情况下，存储所述响应页面的页面数据。The second storage module is used to store the page data of the response page when it is determined that the response page matches the response sample.

可选地，所述第二存储模块用于：Optionally, the second storage module is used for:

根据本公开的第三方面，提供一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现上述第一方面任一所述方法的步骤。According to a third aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, and when the program is executed by a processor, the steps of any method described in the first aspect are implemented.

根据本公开的第四方面，提供一种电子设备，包括：According to a fourth aspect of the present disclosure, there is provided an electronic device, including:

存储器，其上存储有计算机程序；a memory having a computer program stored thereon;

处理器，用于执行所述存储器中的所述计算机程序，以实现上述第一方面任一所述方法的步骤。A processor is used to execute the computer program in the memory to implement the steps of any method described in the first aspect above.

在上述技术方案中，确定接收页面操作的原始请求；获取所述原始请求的特征参数，并将所述特征参数存储到目标结构体；根据所述目标结构体生成数据获取请求；向服务器发送所述数据获取请求，并接收所述服务器发送的所述响应页面的页面数据。通过上述技术方案，在获取原始请求的响应页面的页面数据时，是根据该原始请求的特征参数生成数据获取请求，从而基于该数据获取请求向服务器请求数据，从而无需模拟浏览器的请求响应操作便可以获得页面数据，一方面可以有效降低页面数据获取的复杂性，简化页面数据获取的方式，提高页面数据获取的可靠性和准确度。另一方面，可以有效拓宽该获取页面数据的方法的使用范围。In the above technical solution, the original request for receiving the page operation is determined; the characteristic parameters of the original request are obtained, and the characteristic parameters are stored in the target structure; a data acquisition request is generated according to the target structure; the data acquisition request is sent to the server, and the page data of the response page sent by the server is received. Through the above technical solution, when obtaining the page data of the response page of the original request, a data acquisition request is generated according to the characteristic parameters of the original request, and then data is requested from the server based on the data acquisition request, so that the page data can be obtained without simulating the request response operation of the browser. On the one hand, the complexity of page data acquisition can be effectively reduced, the method of page data acquisition can be simplified, and the reliability and accuracy of page data acquisition can be improved. On the other hand, the scope of use of the method for obtaining page data can be effectively broadened.

本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。Other features and advantages of the present disclosure will be described in detail in the following detailed description.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

附图是用来提供对本公开的进一步理解，并且构成说明书的一部分，与下面的具体实施方式一起用于解释本公开，但并不构成对本公开的限制。在附图中：The accompanying drawings are used to provide a further understanding of the present disclosure and constitute a part of the specification. Together with the following specific embodiments, they are used to explain the present disclosure but do not constitute a limitation of the present disclosure. In the accompanying drawings:

图1是根据本公开的一种实施方式提供的获取页面数据的方法的流程图；FIG1 is a flow chart of a method for acquiring page data according to an embodiment of the present disclosure;

图2是根据本公开的一种实施方式提供的获取页面数据的装置的框图；FIG2 is a block diagram of an apparatus for acquiring page data according to an embodiment of the present disclosure;

图3是根据一示例性实施例示出的一种电子设备的框图；FIG3 is a block diagram of an electronic device according to an exemplary embodiment;

图4是根据一示例性实施例示出的一种电子设备的框图。Fig. 4 is a block diagram of an electronic device according to an exemplary embodiment.

具体实施方式Detailed ways

以下结合附图对本公开的具体实施方式进行详细说明。应当理解的是，此处所描述的具体实施方式仅用于说明和解释本公开，并不用于限制本公开。The specific implementation of the present disclosure is described in detail below in conjunction with the accompanying drawings. It should be understood that the specific implementation described herein is only used to illustrate and explain the present disclosure, and is not used to limit the present disclosure.

图1所示，为根据本公开的一种实施方式提供的获取页面数据的方法的流程图。如图1所示，所述方法包括：FIG1 is a flowchart of a method for obtaining page data according to an embodiment of the present disclosure. As shown in FIG1 , the method includes:

在S11中，接收页面操作的原始请求，其中，页面操作可以是页面查询操作、查看操作等，该原始请求即为进行该页面操作发起的请求。In S11, an original request for a page operation is received, wherein the page operation may be a page query operation, a view operation, etc., and the original request is a request initiated to perform the page operation.

在S12中，获取原始请求的特征参数，并将特征参数存储到目标结构体。In S12, characteristic parameters of the original request are obtained and stored in the target structure.

可选地，所述特征参数可以包括原始请求对应的路径参数，可以从该原始请求的URL(Uniform Resource Locator，统一资源定位符)中获取。其中，该原始请求可以用于请求一个页面的全部，路径参数即为该原始请求的URL；也可以用于请求页面中的某一部分数据，如该原始请求用于请求页面中的正文部分，此时原始请求对应的路径参数即为该正文部分对应的URL。Optionally, the characteristic parameter may include a path parameter corresponding to the original request, which can be obtained from the URL (Uniform Resource Locator) of the original request. The original request may be used to request the entire page, and the path parameter is the URL of the original request; it may also be used to request a portion of data in the page, such as the original request is used to request the body of the page, in which case the path parameter corresponding to the original request is the URL corresponding to the body.

可选地，所述特征参数还可以包括查询参数，查询参数为用于约束页面中的显示数据的参数，例如查询系统中成绩高于80的学生名单，则该查询参数即为成绩高于80。其中，由于原始请求对应的系统不同，在一种实施例中，查询参数可以从该原始请求的URL中获取；在另一种实施例中，查询参数可以从该原始请求的请求体中获取，本公开对此不进行限定。Optionally, the characteristic parameters may also include query parameters, which are parameters used to constrain the displayed data in the page. For example, if the system is queried for a list of students with scores higher than 80, the query parameter is scores higher than 80. Since the original request corresponds to different systems, in one embodiment, the query parameter may be obtained from the URL of the original request; in another embodiment, the query parameter may be obtained from the request body of the original request, which is not limited in the present disclosure.

在获取原始请求的上述特征参数后，则可以将该特征参数存储到目标结构体中。示例地，该目标结构体可以为JSON文件，其格式如下：After obtaining the above characteristic parameters of the original request, the characteristic parameters can be stored in the target structure. For example, the target structure can be a JSON file with the following format:

URL:http://localhost:8080/api/{用户自定义路径A}/{用户自定义路径B}URL: http://localhost:8080/api/{user-defined path A}/{user-defined path B}

路径参数：？{用户自定义参数a}＝参数值&{用户自定义参数b}＝参数值查询参数：Path parameters: ? {user-defined parameter a} = parameter value & {user-defined parameter b} = parameter value Query parameters:

因此，在提取出特征参数后，可以将该特征参数作为参数值对应存储至该目标结构体中，以针对不同形式的原始请求采用标准化的表示方式，从而可以对不同表示方式的原始请求进行统一的表示和管理，提高该方法的兼容性，并拓宽该方法的应用场景和范围。Therefore, after extracting the characteristic parameters, the characteristic parameters can be stored as parameter values in the target structure, so that a standardized representation method can be used for original requests in different forms, thereby uniformly representing and managing original requests in different representation methods, improving the compatibility of the method, and broadening the application scenarios and scope of the method.

在S13中，根据目标结构体生成数据获取请求，所述数据获取请求用于请求获取所述原始请求的响应页面的页面数据。In S13, a data acquisition request is generated according to the target structure, wherein the data acquisition request is used to request to acquire page data of a response page of the original request.

在S14中，向服务器发送数据获取请求。In S14, a data acquisition request is sent to the server.

也就是说，在本申请中，在根据原始请求请求获取响应页面的页面数据时，是直接根据目标结构体的特征参数生成数据获取请求，而并非如现有技术中根据原始请求模拟浏览器的操作获取页面数据，因此既可以适用于B/S(Browser/Server，浏览器/服务器)架构的系统中，还可以适用于C/S(Client/Server，客户机/服务器)架构和APP中。That is to say, in the present application, when obtaining the page data of the response page according to the original request, a data acquisition request is directly generated according to the characteristic parameters of the target structure, rather than simulating the browser operation to obtain the page data according to the original request as in the prior art. Therefore, it can be applied to systems with B/S (Browser/Server) architecture, as well as C/S (Client/Server) architecture and APP.

在S15中，接收服务器发送的响应页面的页面数据。In S15, the page data of the response page sent by the server is received.

以下，对确定原始请求对应的数据获取请求的方式进行详细说明。The following describes in detail a method for determining a data acquisition request corresponding to an original request.

示例地，可以利用录制工具预先对用户从发起某一操作请求至获得该操作请求对应的页面的过程进行录制，从而可以获得录制结果的描述文件。For example, a recording tool may be used to record in advance the process from a user initiating a certain operation request to obtaining a page corresponding to the operation request, thereby obtaining a description file of the recording result.

之后可以将该录制结果的描述文件进行解析，并将解析的结果转换至预先定义的标准格式。其中，由于不同的录制工具获得的录制结果的描述文件的格式可能不同，因此，通过将解析的结果转换至标准格式，可以实现对多种录制工具的适配。其中，需要进行说明的是，在现有技术中，在发起一次操作请求获得与该操作请求对应的响应页面的数据的过程中，会与后台服务器之间进行多次请求的交互，例如，发起用于获取响应页面的样式文件的请求，发起用于获取响应页面的JS文件的请求，发起用于获取响应页面的页面数据的请求等。在上述操作请求中，用于获取响应页面的页面数据的请求即在该操作请求进行响应的过程中实质用于获取数据的请求，即本公开中所述的数据获取请求。Afterwards, the description file of the recording result can be parsed, and the parsed result can be converted into a predefined standard format. Among them, since the formats of the description files of the recording results obtained by different recording tools may be different, therefore, by converting the parsed results into a standard format, adaptation to a variety of recording tools can be achieved. Among them, it should be noted that in the prior art, in the process of initiating an operation request to obtain the data of the response page corresponding to the operation request, multiple requests will be interacted with the background server, for example, a request for obtaining the style file of the response page, a request for obtaining the JS file of the response page, a request for obtaining the page data of the response page, etc. In the above-mentioned operation request, the request for obtaining the page data of the response page is the request actually used to obtain data in the process of responding to the operation request, that is, the data acquisition request described in the present disclosure.

其中，在标准格式的描述文件(为便于下文说明，简记为格式化描述文件)中包含在该操作请求进行响应的过程中的全部请求。示例地，该格式化描述文件包括请求a-g共7个请求。在根据格式化描述文件确定数据获取请求时，可以从响应页面中显示的页面数据中提取分析，获得关键词信息。基于该关键词信息，从格式化描述文件中查询包含该关键词信息的请求，示例地查询到请求g，将请求g作为查询请求。之后，则可以将请求g中的携带的参数作为新的关键词信息，以从格式化描述文件中查询包含该新的关键词信息的请求。示例地，请求g中携带的参数为cookieA和密文B，则可以基于cookieA从格式化描述文件中查询与cookieA对应的请求，基于密文B从格式化描述文件中查询与密文B对应的请求。其中，需要进行说明的是，若查询出多个请求，则将多个请求中请求时间最早的请求作为新的查询请求。若查询不到新的请求，或查询到的请求为登录请求，则结束该分析过程，并将确定的出各个查询请求按照请求时间由早至晚的顺序进行排序，将该排序结果作为数据获取请求，即该数据获取请求中可以包含多个具有顺序相关的子请求。Among them, the description file in the standard format (for the convenience of the following description, abbreviated as the formatted description file) contains all the requests in the process of responding to the operation request. For example, the formatted description file includes a total of 7 requests from request a to g. When determining the data acquisition request according to the formatted description file, the page data displayed in the response page can be extracted and analyzed to obtain keyword information. Based on the keyword information, the request containing the keyword information is queried from the formatted description file, and request g is queried for example, and request g is used as the query request. After that, the parameters carried in request g can be used as new keyword information to query the request containing the new keyword information from the formatted description file. For example, the parameters carried in request g are cookieA and ciphertext B, then the request corresponding to cookieA can be queried from the formatted description file based on cookieA, and the request corresponding to ciphertext B can be queried from the formatted description file based on ciphertext B. Among them, it should be noted that if multiple requests are queried, the request with the earliest request time among the multiple requests is used as a new query request. If no new request is found, or the found request is a login request, the analysis process is terminated, and the determined query requests are sorted in order from early to late according to the request time, and the sorting result is used as the data acquisition request, that is, the data acquisition request can contain multiple sub-requests with order correlation.

其中，录制工具可以采用现有的任一录制工具即可，对描述文件的解析为现有技术，本公开对此不进行限定。The recording tool may be any existing recording tool, and the analysis of the description file is a prior art, which is not limited in the present disclosure.

因此，通过上述技术方案，可以确定出在一次操作请求中实际用于获取页面数据的数据获取请求。因此，在进行页面数据获取时，可以根据原始请求生成对应的数据获取请求，从而也可以有效避免获取不必要的样式文件或JS文件，提高页面数据获取的效率。Therefore, through the above technical solution, the data acquisition request actually used to acquire page data in an operation request can be determined. Therefore, when acquiring page data, a corresponding data acquisition request can be generated according to the original request, thereby effectively avoiding the acquisition of unnecessary style files or JS files and improving the efficiency of page data acquisition.

可选地，在所述向服务器发送所述数据获取请求的步骤之前，所述方法还包括：Optionally, before the step of sending the data acquisition request to the server, the method further includes:

若确定在对原始请求进行响应之前需要进行登录操作，则获取所述登录操作对应的响应令牌。其中，响应令牌为服务器对用户的登录信息认证通过后返回的，用于表示用户的合法登录状态。If it is determined that a login operation is required before responding to the original request, a response token corresponding to the login operation is obtained. The response token is returned by the server after the user's login information is authenticated, and is used to indicate the user's legal login status.

其中，在一种实施例中，确定在对原始请求进行响应之前需要进行登录操作，可以是通过该原始请求关联的标识确定。例如，若在对原始请求进行响应之前需要进行登录操作，则该原始请求关联的标识(如标识1)用于指示其需要登录操作，若在对原始请求进行响应之前不需要进行登录操作，则该原始请求关联的标识(如标识2)用于指示其不需要登录操作。其中，该关联标识可以在确定数据获取请求时对对应的原始请求进行关联。因此，在获取到原始请求时，可以根据其关联的标识直接确定是否需要登录操作。Wherein, in one embodiment, determining whether a login operation is required before responding to the original request can be determined by an identifier associated with the original request. For example, if a login operation is required before responding to the original request, the identifier associated with the original request (such as identifier 1) is used to indicate that a login operation is required, and if a login operation is not required before responding to the original request, the identifier associated with the original request (such as identifier 2) is used to indicate that a login operation is not required. Wherein, the associated identifier can be associated with the corresponding original request when determining the data acquisition request. Therefore, when the original request is obtained, it can be directly determined whether a login operation is required based on its associated identifier.

在上述技术方案中，在向服务器发送数据获取请求之前，可以通过确定在对原始请求进行响应之前需要进行登录操作，以确定当前用户是否处于合法状态。若确定在对原始请求进行响应之前需要进行登录操作，则需要获得登录操作对应的响应令牌，从而保证用户的合法登录状态。并且，在生成数据获取请求时，可以在该数据获取请求中携带该响应令牌，从而可以保证响应页面的准确性，从而提高页面数据获取的效率和准确性，避免用户的登录状态对页面数据获取的影响。In the above technical solution, before sending a data acquisition request to the server, it is possible to determine whether the current user is in a legal state by determining that a login operation is required before responding to the original request. If it is determined that a login operation is required before responding to the original request, it is necessary to obtain a response token corresponding to the login operation, thereby ensuring the legal login state of the user. In addition, when generating a data acquisition request, the response token can be carried in the data acquisition request, thereby ensuring the accuracy of the response page, thereby improving the efficiency and accuracy of page data acquisition, and avoiding the influence of the user's login state on page data acquisition.

可选地，所述获取所述登录操作对应的响应令牌的一种示例性实施方式如下，包括：Optionally, an exemplary implementation of obtaining a response token corresponding to the login operation is as follows, including:

示例地，针对于较为简单的登录过程，如不需要验证码登录或者需要图像验证码等场景的登录过程，用户可以通过登录插件录制登录脚本，则该登录脚本中记录有该用户登录时输入的登录信息，例如账号、密码等信息，并将该登录脚本和原始请求相关联。因此，在确定在对原始请求进行响应之前需要进行登录操作时，可以获得该原始请求对应的登录脚本，进而通过回放该登录脚本的方式进行登录，从而可以获得响应令牌。For example, for a relatively simple login process, such as a login process that does not require a verification code or requires an image verification code, the user can record a login script through a login plug-in, and the login script records the login information entered by the user when logging in, such as account number, password, etc., and associates the login script with the original request. Therefore, when it is determined that a login operation is required before responding to the original request, the login script corresponding to the original request can be obtained, and then the login can be performed by replaying the login script, so as to obtain a response token.

可选地，针对需要图像验证码的登录场景，在回放所述登录脚本的过程中，若当前操作为验证码输入操作，则获取当前页面中的验证码图像，并向验证码识别模块发送所述验证码图像，以由所述验证码识别模块对所述验证码图像进行识别，获取验证码信息；根据接收到的所述验证码识别模块得出的验证码信息，执行所述验证码输入操作。Optionally, for a login scenario that requires an image verification code, during the playback of the login script, if the current operation is a verification code input operation, the verification code image in the current page is obtained, and the verification code image is sent to a verification code recognition module, so that the verification code recognition module recognizes the verification code image and obtains verification code information; the verification code input operation is performed based on the verification code information obtained by the verification code recognition module.

在该实施例中，针对需要图像验证码的登录场景，用户登录时的图像验证码为动态变化的，因此，在回放登录脚本的过程中，若当前操作为验证码输入操作，此时则需要输入页面中实时显示的图像验证码中的验证码信息。因此，可以获取当前页面中的验证码图像，以通过验证码识别模块进行识别，从而将识别出的验证码信息进行输入，完成登录操作。In this embodiment, for the login scenario that requires an image verification code, the image verification code when the user logs in is dynamically changed. Therefore, during the playback of the login script, if the current operation is a verification code input operation, the verification code information in the image verification code displayed in real time on the input page needs to be input. Therefore, the verification code image on the current page can be obtained to be recognized by the verification code recognition module, so that the recognized verification code information is input to complete the login operation.

通过上述技术方案，可以通过预先录制的登录脚本实现自动化登录操作，从而可以有效减少用户的参与，降低用户的工作量。同时，通过获取验证码图像并识别验证码信息，可以根据登录脚本和页面中动态变化的图像验证码完成登录操作，提高登录操作的准确性和便捷性，进一步降低用户操作，提升用户使用体验。Through the above technical solution, the automated login operation can be realized through the pre-recorded login script, which can effectively reduce the user's participation and reduce the user's workload. At the same time, by obtaining the verification code image and identifying the verification code information, the login operation can be completed according to the login script and the dynamically changing image verification code in the page, improving the accuracy and convenience of the login operation, further reducing user operations and improving the user experience.

可选地，所述获取所述登录操作对应的响应令牌的另一种实现方式如下，包括：Optionally, another implementation of obtaining a response token corresponding to the login operation is as follows, including:

通过登录插件检测登录状态，其中，该登录状态可以包括登录成功和登录失败。在用户登录成功时，服务器会返回与该用户的登录信息对应的响应令牌，此时登录插件可以检测到登录成功，并可以获取到该响应令牌。The login status is detected by the login plug-in, wherein the login status may include successful login and failed login. When the user successfully logs in, the server returns a response token corresponding to the user's login information, at which time the login plug-in can detect the successful login and obtain the response token.

在该实施例中，在不存在与原始请求对应的登录脚本的情况下，可以通过提示用户进行登录操作的方式获得响应令牌。具体地，用户可以在浏览器或者客户端中进行登录，可以通过登录插件对用户的登录操作进行监测，其中，该登录插件为隐蔽式执行，对用户而言该操作是透明的。因此，在用户成功登录之后，通过登录插件可以获得登录成功时的响应令牌。针对于不同的系统，响应令牌的存储位置可以不同，例如，响应令牌可以存储在cookie、本地存储或会话存储中。因此，可以通过登录插件获取cookie、本地存储和会话存储中的数据，从而获得响应令牌。In this embodiment, in the absence of a login script corresponding to the original request, a response token can be obtained by prompting the user to perform a login operation. Specifically, the user can log in in a browser or client, and the user's login operation can be monitored by a login plug-in, wherein the login plug-in is executed in a hidden manner and the operation is transparent to the user. Therefore, after the user successfully logs in, the response token when the login is successful can be obtained through the login plug-in. For different systems, the storage location of the response token can be different, for example, the response token can be stored in a cookie, local storage or session storage. Therefore, the data in the cookie, local storage and session storage can be obtained through the login plug-in to obtain the response token.

在上述技术方案中，若在对所述原始请求进行响应之前需要进行登录操作，且不存在与所述原始请求对应的登录脚本，则可以提示用户进行登录。并且用户只需要在其浏览器或客户端进行登录，无需操作登录插件，便可以获得响应令牌，避免用户重复的登录操作，从而保证后续生成数据获取请求的准确性，进而提高获取的页面数据的准确性。In the above technical solution, if a login operation is required before responding to the original request and there is no login script corresponding to the original request, the user can be prompted to log in. And the user only needs to log in through his browser or client without operating the login plug-in to obtain the response token, avoiding the user's repeated login operation, thereby ensuring the accuracy of the subsequent data acquisition request, and further improving the accuracy of the acquired page data.

其中，在用户登录不合法、网络连接超时等情况下，服务器返回的页面通常与原始请求对应的响应页面是不同的，为了保证该情况下获取的页面数据的准确性，本公开还提供以下实施例。Among them, in cases where user login is illegal, network connection times out, etc., the page returned by the server is usually different from the response page corresponding to the original request. In order to ensure the accuracy of the page data obtained in this case, the present disclosure also provides the following embodiments.

确定所述响应页面的第一页面文本信息与所述原始请求对应的响应样例的第二页面文本信息对应的文本相似度参数。其中，与原始请求对应的响应样例是预先设置的响应于该原始请求的页面，其中，该页面可以是对原始请求对应的数据获取请求进行分析确定时所使用的响应页面，也可以是开发人员根据该响应页面设置的页面，本公对此不进行限定。Determine the text similarity parameter corresponding to the first page text information of the response page and the second page text information of the response sample corresponding to the original request. The response sample corresponding to the original request is a page preset in response to the original request, wherein the page can be a response page used when analyzing and determining the data acquisition request corresponding to the original request, or a page set by the developer based on the response page, which is not limited by the present disclosure.

示例地，响应页面的第一页面文本信息可以为响应页面对应的文本向量，与所述原始请求对应的响应样例的第二页面文本信息可以是响应样例对应的文本向量，则可以通过响应页面对应的文本向量和响应样例对应的文本向量，确定文本相似度参数。示例地，该文本相似度参数可以为响应页面对应的文本向量和响应样例对应的文本向量之间的距离或余弦相似度。其中，计算向量之间的距离或余弦相似度的方式为现有技术，在此不再赘述。By way of example, the first page text information of the response page may be a text vector corresponding to the response page, and the second page text information of the response sample corresponding to the original request may be a text vector corresponding to the response sample. Then, the text similarity parameter may be determined by the text vector corresponding to the response page and the text vector corresponding to the response sample. By way of example, the text similarity parameter may be the distance or cosine similarity between the text vector corresponding to the response page and the text vector corresponding to the response sample. The method of calculating the distance or cosine similarity between vectors is a prior art and will not be described in detail herein.

根据所述文本相似度参数确定所述响应页面与所述响应样例是否匹配。Determine whether the response page matches the response sample according to the text similarity parameter.

作为示例，若文本相似度参数为响应页面对应的文本向量和响应样例对应的文本向量之间的距离，则在该距离小于预设的距离阈值时确定响应页面与响应样例匹配。As an example, if the text similarity parameter is the distance between the text vector corresponding to the response page and the text vector corresponding to the response sample, the response page is determined to match the response sample when the distance is less than a preset distance threshold.

作为示例，若文本相似度参数为响应页面对应的文本向量和响应样例对应的文本向量之间的余弦相似度，则在该余弦值大于预设的余弦相似度阈值时确定响应页面与响应样例匹配。As an example, if the text similarity parameter is the cosine similarity between the text vector corresponding to the response page and the text vector corresponding to the response sample, the response page is determined to match the response sample when the cosine value is greater than a preset cosine similarity threshold.

其中，距离阈值和余弦相似度阈值可以根据实际使用场景进行设置，本公开对此不进行限定。Among them, the distance threshold and the cosine similarity threshold can be set according to the actual usage scenario, and the present disclosure does not limit this.

若确定所述响应页面和所述响应样例匹配，存储所述响应页面的页面数据。也就是说，在确定响应页面为准确的页面时，存储其页面数据。If it is determined that the response page matches the response sample, the page data of the response page is stored. In other words, when it is determined that the response page is an accurate page, its page data is stored.

在上述技术方案中，通过确定响应页面和响应是否样例匹配，以确定该响应页面的页面数据是否是原始请求请求获取的数据；在确定所述响应页面和响应样例匹配时，存储所述响应页面的页面数据，从而可以通过对响应页面进行校验，避免获取的页面数据与原始请求请求获取的数据不一致的问题，提高页面数据获取的准确性和有效性。同时，也可以有效避免存储错误的页面数据，避免存储资源的浪费。In the above technical solution, by determining whether the response page and the response sample match, it is determined whether the page data of the response page is the data obtained by the original request; when it is determined that the response page and the response sample match, the page data of the response page is stored, so that the problem of inconsistency between the obtained page data and the data obtained by the original request can be avoided by verifying the response page, thereby improving the accuracy and effectiveness of page data acquisition. At the same time, it can also effectively avoid storing wrong page data and avoid wasting storage resources.

示例地，结构化文件可以是HTML、JSON、XML文件，存储所述响应页面的页面数据至结构化文件中，从而可以基于该结构化文件通过ETL技术将页面数据存储至数据库中。其中ETL(Extract-Transform-Load)，用于将数据从来源端经过抽取(extract)、转换(transform)、加载(load)至目的端的过程。For example, the structured file can be an HTML, JSON, or XML file, and the page data of the response page is stored in the structured file, so that the page data can be stored in the database through ETL technology based on the structured file. ETL (Extract-Transform-Load) is used to extract, transform, and load data from the source end to the destination end.

在上述技术方案中，通过将页面数据存储至结构化文件中，从而可以便捷地将结构化文件中的数据存储至数据库中，从而实现数据的安全、持久存储，便于用户查看，提升用户使用体验。In the above technical solution, by storing page data in a structured file, the data in the structured file can be conveniently stored in a database, thereby achieving secure and persistent storage of data, facilitating user viewing and improving user experience.

本公开还提供一种获取页面数据的装置，如图2所示，所述装置10包括：The present disclosure also provides a device for acquiring page data, as shown in FIG2 , the device 10 includes:

第一接收模块100，用于接收页面操作的原始请求；The first receiving module 100 is used to receive an original request for a page operation;

第一存储模块200，用于获取所述原始请求的特征参数，并将所述特征参数存储到目标结构体；A first storage module 200 is used to obtain characteristic parameters of the original request and store the characteristic parameters in a target structure;

生成模块300，用于根据所述目标结构体生成数据获取请求，所述数据获取请求用于请求获取所述原始请求的响应页面的页面数据；A generating module 300, configured to generate a data acquisition request according to the target structure, wherein the data acquisition request is used to request to acquire page data of a response page of the original request;

发送模块400，用于向服务器发送所述数据获取请求；A sending module 400 is used to send the data acquisition request to the server;

第二接收模块500，用于接收所述服务器发送的所述响应页面的页面数据。The second receiving module 500 is used to receive the page data of the response page sent by the server.

可选地，所述装置还包括：Optionally, the device further comprises:

所述生成模块用于：The generating module is used for:

可选地，所述装置还包括：Optionally, the device further comprises:

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the device in the above embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be elaborated here.

本公开还提供一种计算机接口，所述计算机接口用于实现上述获取页面数据的方法的步骤。通过将上述获取页面数据的方法封装成计算机接口，可以便于第三方调用，提高页面数据获取的便捷性和兼容性。The present disclosure also provides a computer interface, which is used to implement the steps of the method for obtaining page data. By encapsulating the method for obtaining page data into a computer interface, it can be convenient for a third party to call, thereby improving the convenience and compatibility of obtaining page data.

图3是根据一示例性实施例示出的一种电子设备700的框图。如图3所示，该电子设备700可以包括：处理器701，存储器702。该电子设备700还可以包括多媒体组件703，输入/输出(I/O)接口704，以及通信组件705中的一者或多者。Fig. 3 is a block diagram of an electronic device 700 according to an exemplary embodiment. As shown in Fig. 3, the electronic device 700 may include: a processor 701, a memory 702. The electronic device 700 may also include one or more of a multimedia component 703, an input/output (I/O) interface 704, and a communication component 705.

其中，处理器701用于控制该电子设备700的整体操作，以完成上述的获取页面数据的方法中的全部或部分步骤。存储器702用于存储各种类型的数据以支持在该电子设备700的操作，这些数据例如可以包括用于在该电子设备700上操作的任何应用程序或方法的指令，以及应用程序相关的数据，例如联系人数据、收发的消息、图片、音频、视频等等。该存储器702可以由任何类型的易失性或非易失性存储设备或者它们的组合实现，例如静态随机存取存储器(Static Random Access Memory，简称SRAM)，电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory，简称EEPROM)，可擦除可编程只读存储器(Erasable Programmable Read-Only Memory，简称EPROM)，可编程只读存储器(Programmable Read-Only Memory，简称PROM)，只读存储器(Read-Only Memory，简称ROM)，磁存储器，快闪存储器，磁盘或光盘。多媒体组件703可以包括屏幕和音频组件。其中屏幕例如可以是触摸屏，音频组件用于输出和/或输入音频信号。例如，音频组件可以包括一个麦克风，麦克风用于接收外部音频信号。所接收的音频信号可以被进一步存储在存储器702或通过通信组件705发送。音频组件还包括至少一个扬声器，用于输出音频信号。I/O接口704为处理器701和其他接口模块之间提供接口，上述其他接口模块可以是键盘，鼠标，按钮等。这些按钮可以是虚拟按钮或者实体按钮。通信组件705用于该电子设备700与其他设备之间进行有线或无线通信。无线通信，例如Wi-Fi，蓝牙，近场通信(Near FieldCommunication，简称NFC)，2G、3G、4G、NB-IOT、eMTC、或其他5G等等，或它们中的一种或几种的组合，在此不做限定。因此相应的该通信组件705可以包括：Wi-Fi模块，蓝牙模块，NFC模块等等。The processor 701 is used to control the overall operation of the electronic device 700 to complete all or part of the steps in the above-mentioned method for obtaining page data. The memory 702 is used to store various types of data to support the operation of the electronic device 700. For example, these data may include instructions for any application or method used to operate on the electronic device 700, and application-related data, such as contact data, sent and received messages, pictures, audio, video, etc. The memory 702 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (Static Random Access Memory, referred to as SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, referred to as EEPROM), erasable programmable read-only memory (Erasable Programmable Read-Only Memory, referred to as EPROM), programmable read-only memory (Programmable Read-Only Memory, referred to as PROM), read-only memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, magnetic disk or optical disk. The multimedia component 703 may include a screen and an audio component. The screen may be, for example, a touch screen, and the audio component is used to output and/or input audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may be further stored in the memory 702 or sent through the communication component 705. The audio component also includes at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, and the other interface modules may be keyboards, mice, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 705 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless communication, such as Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, is not limited here. Therefore, the corresponding communication component 705 may include: Wi-Fi module, Bluetooth module, NFC module, etc.

在一示例性实施例中，电子设备700可以被一个或多个应用专用集成电路(Application Specific Integrated Circuit，简称ASIC)、数字信号处理器(DigitalSignal Processor，简称DSP)、数字信号处理设备(Digital Signal Processing Device，简称DSPD)、可编程逻辑器件(Programmable Logic Device，简称PLD)、现场可编程门阵列(Field Programmable Gate Array，简称FPGA)、控制器、微控制器、微处理器或其他电子元件实现，用于执行上述的获取页面数据的方法。In an exemplary embodiment, the electronic device 700 can be implemented by one or more application specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field programmable gate arrays (FPGA), controllers, microcontrollers, microprocessors or other electronic components to execute the above-mentioned method for obtaining page data.

在另一示例性实施例中，还提供了一种包括程序指令的计算机可读存储介质，该程序指令被处理器执行时实现上述的获取页面数据的方法的步骤。例如，该计算机可读存储介质可以为上述包括程序指令的存储器702，上述程序指令可由电子设备700的处理器701执行以完成上述的获取页面数据的方法。In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided, and when the program instructions are executed by a processor, the steps of the above-mentioned method for obtaining page data are implemented. For example, the computer-readable storage medium can be the above-mentioned memory 702 including program instructions, and the above-mentioned program instructions can be executed by the processor 701 of the electronic device 700 to complete the above-mentioned method for obtaining page data.

图4是根据一示例性实施例示出的一种电子设备1900的框图。例如，电子设备1900可以被提供为一服务器。参照图4，电子设备1900包括处理器1922，其数量可以为一个或多个，以及存储器1932，用于存储可由处理器1922执行的计算机程序。存储器1932中存储的计算机程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外，处理器1922可以被配置为执行该计算机程序，以执行上述的获取页面数据的方法。FIG4 is a block diagram of an electronic device 1900 according to an exemplary embodiment. For example, the electronic device 1900 may be provided as a server. Referring to FIG4 , the electronic device 1900 includes a processor 1922, which may be one or more, and a memory 1932 for storing a computer program executable by the processor 1922. The computer program stored in the memory 1932 may include one or more modules, each corresponding to a set of instructions. In addition, the processor 1922 may be configured to execute the computer program to execute the above-mentioned method for obtaining page data.

另外，电子设备1900还可以包括电源组件1926和通信组件1950，该电源组件1926可以被配置为执行电子设备1900的电源管理，该通信组件1950可以被配置为实现电子设备1900的通信，例如，有线或无线通信。此外，该电子设备1900还可以包括输入/输出(I/O)接口1958。电子设备1900可以操作基于存储在存储器1932的操作系统，例如WindowsServerTM，Mac OS XTM，UnixTM，LinuxTM等等。In addition, the electronic device 1900 may further include a power supply component 1926 and a communication component 1950, wherein the power supply component 1926 may be configured to perform power management of the electronic device 1900, and the communication component 1950 may be configured to implement communication of the electronic device 1900, for example, wired or wireless communication. In addition, the electronic device 1900 may further include an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in the memory 1932, such as Windows Server™, Mac OS X™, Unix™, Linux™, etc.

在另一示例性实施例中，还提供了一种包括程序指令的计算机可读存储介质，该程序指令被处理器执行时实现上述的获取页面数据的方法的步骤。例如，该计算机可读存储介质可以为上述包括程序指令的存储器1932，上述程序指令可由电子设备1900的处理器1922执行以完成上述的获取页面数据的方法。In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided, and when the program instructions are executed by a processor, the steps of the above-mentioned method for obtaining page data are implemented. For example, the computer-readable storage medium can be the above-mentioned memory 1932 including program instructions, and the above-mentioned program instructions can be executed by the processor 1922 of the electronic device 1900 to complete the above-mentioned method for obtaining page data.

在另一示例性实施例中，还提供一种计算机程序产品，该计算机程序产品包含能够由可编程的装置执行的计算机程序，该计算机程序具有当由该可编程的装置执行时用于执行上述的获取页面数据的方法的代码部分。In another exemplary embodiment, a computer program product is further provided. The computer program product includes a computer program that can be executed by a programmable device. The computer program has a code portion for executing the above method for acquiring page data when executed by the programmable device.

以上结合附图详细描述了本公开的优选实施方式，但是，本公开并不限于上述实施方式中的具体细节，在本公开的技术构思范围内，可以对本公开的技术方案进行多种简单变型，这些简单变型均属于本公开的保护范围。The preferred embodiments of the present disclosure are described in detail above in conjunction with the accompanying drawings; however, the present disclosure is not limited to the specific details in the above embodiments. Within the technical concept of the present disclosure, a variety of simple modifications can be made to the technical solution of the present disclosure, and these simple modifications all fall within the protection scope of the present disclosure.

另外需要说明的是，在上述具体实施方式中所描述的各个具体技术特征，在不矛盾的情况下，可以通过任何合适的方式进行组合。为了避免不必要的重复，本公开对各种可能的组合方式不再另行说明。It should also be noted that the various specific technical features described in the above specific embodiments can be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, the present disclosure will not further describe various possible combinations.

此外，本公开的各种不同的实施方式之间也可以进行任意组合，只要其不违背本公开的思想，其同样应当视为本公开所公开的内容。In addition, various embodiments of the present disclosure may be arbitrarily combined, and as long as they do not violate the concept of the present disclosure, they should also be regarded as the contents disclosed by the present disclosure.

Claims

1. A method for obtaining page data, characterized in that the method comprises:

Receive the original request for page operation;

Obtaining characteristic parameters of the original request, and storing the characteristic parameters in a target structure;

Generate a data acquisition request according to the target structure, wherein the data acquisition request is used to request to acquire page data of a response page of the original request;

Sending the data acquisition request to the server;

Receiving page data of the response page sent by the server;

Determine a text similarity parameter corresponding to the first page text information of the response page and the second page text information of the response sample corresponding to the original request;

Determining whether the response page matches the response sample according to the text similarity parameter;

If it is determined that the response page matches the response sample, storing the page data of the response page;

Among them, when determining the data acquisition request according to the formatting description file, the page data displayed in the response page is extracted and analyzed to obtain keyword information; based on the keyword information, the request containing the keyword information is queried from the formatting description file; the parameters carried in the queried request are used as new keyword information, and the step of re-executing the step of querying the request containing the keyword information from the formatting description file based on the keyword information is performed; if multiple requests are queried, the request with the earliest request time among the multiple requests is used as the new queried request; if no request is found, or the queried request is a login request, the query process is terminated, and the queried requests are sorted in order from early to late according to the request time, and the sorting result is used as the data acquisition request, and the data acquisition request contains multiple sub-requests with sequence correlation;

Among them, a recording tool is used to record in advance the process from the user initiating the original request to obtaining the page corresponding to the original request to obtain a description file of the recording result, and the description file of the recording result is parsed, and the parsed result is converted into a predefined standard format to obtain the formatted description file.

2. The method according to claim 1, characterized in that before the step of requesting the server for data acquisition, the method further comprises:

If it is determined that a login operation is required before responding to the original request, obtaining a response token corresponding to the login operation;

The step of generating a data acquisition request according to the target structure includes:

A data acquisition request is generated according to the target structure and the response token.

3. The method according to claim 2, wherein obtaining a response token corresponding to the login operation comprises:

If there is a login script corresponding to the original request, the login script is replayed to obtain a response token after the login information used by the login script is authenticated, wherein the login script is pre-recorded and generated based on the login plug-in.

4. The method according to claim 3 is characterized in that, during the playback of the login script, if the current operation is a verification code input operation, the verification code image in the current page is obtained, and the verification code image is sent to a verification code recognition module, so that the verification code recognition module recognizes the verification code image and obtains verification code information; the verification code input operation is performed according to the verification code information obtained by the verification code recognition module.

5. The method according to claim 2, wherein obtaining a response token corresponding to the login operation comprises:

If a login operation is required before responding to the original request, and there is no login script corresponding to the original request, output a prompt message to prompt the user to perform the login operation;

Detect login status through login plugin;

A response token is received from the login plug-in when a successful login is detected.

6. The method according to claim 1, wherein storing the page data of the response page comprises:

The page data of the response page is stored in a structured file, so as to be stored in a database based on the structured file.

7. A device for acquiring page data, characterized in that the device comprises:

A first receiving module, used for receiving an original request for a page operation;

A first storage module, used for obtaining characteristic parameters of the original request and storing the characteristic parameters in a target structure;

A generating module, used for generating a data acquisition request according to the target structure, wherein the data acquisition request is used for requesting to acquire page data of a response page of the original request;

A sending module, used for sending the data acquisition request to the server;

A second receiving module, used for receiving the page data of the response page sent by the server;

A first determining module is used to determine a text similarity parameter corresponding to first page text information of the response page and second page text information of the response sample corresponding to the original request after the second receiving module receives the page data of the response page sent by the server;

A second determination module, configured to determine whether the response page matches the response sample according to the text similarity parameter;

A second storage module is used to store page data of the response page when it is determined that the response page matches the response sample;

8. A computer-readable storage medium having a computer program stored thereon, characterized in that when the program is executed by a processor, the steps of the method according to any one of claims 1 to 6 are implemented.

9. An electronic device, comprising:

a memory having a computer program stored thereon;

A processor, configured to execute the computer program in the memory to implement the steps of the method according to any one of claims 1 to 6.