Disclosure of Invention
The invention provides a multi-language detection method, device, equipment and medium for Web-side documents, which solve the technical problems that the existing multi-language support scheme needs a large amount of manual intervention, the detection efficiency and detection precision are limited, the system updating requirement cannot be met, and different language abnormal documents are difficult to find accurately.
The invention provides a multi-language detection method of a Web terminal document, which is applied to a Web terminal and comprises the following steps:
responding to the detection request, crawling the page document; the page text comprises static text and/or dynamic text;
Calling a request simulation component to send a behavior request to a document interface associated with the dynamic document, and carrying out language detection according to the returned request document to generate a first detection result;
detecting the static document according to the detection requirement information to generate a second detection result;
And if the first detection result or the second detection result accords with an abnormal condition, generating and outputting abnormal indication information corresponding to the page document.
Optionally, the call request simulation component sends a behavior request to a document interface associated with the dynamic document, and performs language detection according to the returned request document, so as to generate a first detection result, including:
calling a request simulation component to send a behavior request to a document interface associated with the dynamic document;
When a request document returned by the document interface is received, comparing the data language of the request document with the expected language of the webpage;
if the data language is the same as the expected language of the webpage, detecting passing of the webpage as a first detection result;
If the data language is different from the expected language of the webpage, detecting that the data language does not pass through the webpage as a first detection result.
Optionally, the detecting the static document according to the detection requirement information, to generate a second detection result, includes:
if the accuracy detection requirement does not exist in the detection requirement information, acquiring the language of the document corresponding to the static document;
comparing the language of the text with the expected language of the webpage, and generating a second detection result according to the comparison result;
if accuracy detection requirements exist in the detection requirement information, acquiring a static configuration file corresponding to the static document;
and generating a second detection result according to the comparison result of the static configuration file and the preset translation table.
Optionally, the comparing the language of the document with the expected language of the web page generates a second detection result according to the comparison result, including:
comparing note or comment kinds of the texts with the expected languages of the webpage;
if the text note or comment is the same as the expected language of the webpage, detecting passing as a second detection result;
if the text note or comment is different from the expected language of the web page, detecting that the text does not pass through as a second detection result.
Optionally, generating a second detection result according to the comparison result of the static configuration file and the preset translation table includes:
Comparing the static configuration file with a preset translation table;
If the static configuration file has codes different from the translation table, detecting that the static configuration file does not pass through the translation table as a second detection result;
If the static configuration file does not have codes different from the translation table, detecting passing of the static configuration file as a second detection result.
Optionally, the abnormal condition is that the detection fails; and if the first detection result or the second detection result accords with an abnormal condition, generating and outputting abnormal indication information corresponding to the page document, wherein the abnormal indication information comprises:
If the first detection result is that the detection is failed, positioning interface path information of the document interface, generating abnormal indication information corresponding to the page document in a table form, and outputting the abnormal indication information;
If the second detection result is that the detection is not passed, positioning element screenshot from the static file or positioning code positions from the static configuration file, generating abnormal indication information corresponding to the page file in a table form, and outputting the abnormal indication information.
Optionally, the method further comprises:
and if the first detection result and the second detection result do not meet the abnormal condition, skipping to execute the step of responding to the detection request and crawling the page document.
The second aspect of the present invention provides a multi-language detection device for a Web document, which is applied to a Web, and the device comprises:
The document crawling module is used for crawling the page document in response to the detection request; the page text comprises static text and/or dynamic text;
the dynamic document detection module is used for calling the request simulation component to send a behavior request to a document interface associated with the dynamic document, and carrying out language detection according to the returned request document to generate a first detection result;
the static document detection module is used for detecting the static document according to the detection requirement information to generate a second detection result;
And the abnormal information output module is used for generating and outputting abnormal indication information corresponding to the page document if the first detection result or the second detection result accords with an abnormal condition.
A third aspect of the present invention provides an electronic device, including a memory and a processor, where the memory stores a computer program, where the computer program, when executed by the processor, causes the processor to perform the steps of the multi-language detection method of the Web-side document according to any one of the first aspect of the present invention.
A fourth aspect of the present invention provides a computer readable storage medium having stored thereon a computer program which when executed implements a method of multi-language detection of Web-side documents according to any one of the first aspects of the present invention.
From the above technical scheme, the invention has the following advantages:
According to the invention, the Web end responds to the detection request to crawl the page document; the page document comprises a static document and/or a dynamic document; calling a request simulation component to send a behavior request to a document interface associated with a dynamic document, and carrying out language detection according to the returned request document to generate a first detection result; detecting the static document according to the detection requirement information to generate a second detection result; if the first detection result or the second detection result accords with the abnormal condition, generating and outputting abnormal indication information corresponding to the page document. Therefore, by crawling the page documents in real time and respectively carrying out different detection according to the types of the page documents, the detection efficiency is effectively improved under the condition of reducing manual intervention, and abnormal documents in different languages can be accurately found.
Detailed Description
The embodiment of the invention provides a multi-language detection method, device, equipment and medium for Web-side documents, which are used for solving the technical problems that the existing multi-language support scheme needs a large amount of manual intervention, the detection efficiency and detection precision are limited, the system updating requirement cannot be met, and different language abnormal documents are difficult to find accurately.
In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a multi-language detection method for a Web document according to an embodiment of the present invention.
The invention provides a multi-language detection method of a Web terminal document, which is applied to a Web terminal and comprises the following steps:
Step 101, responding to a detection request, and crawling a page document; the page document comprises a static document and/or a dynamic document;
Page text refers to text content used on various portions of a web page or on individual pages that is used on the web page to convey information, guide user behavior, enhance user experience, and facilitate interactions with users. The page document may appear in various locations of the web page, such as a title, subtitle, paragraph, button, link, form, pictorial illustration, pop-up prompt, footer information, etc. The same page can simultaneously comprise a static file and a dynamic file, and the specific file type is determined by the corresponding control type. Static documents refer to documents whose content is relatively fixed, the content itself not changing with user behavior or time, such as item detail pages; dynamic documents refer to documents whose content can be changed or adjusted in real time according to factors such as user behavior, time, environment and the like, and the documents are generated to be more suitable for the requirements of users by analyzing user data and predicting user intention through algorithms, such as dynamic update or personalized recommendation of video advertisements and social media.
In the embodiment of the invention, the Web end responds to the input detection request, the currently displayed page document is crawled in real time to acquire the data basis of subsequent processing, the data crawling can be performed by using the requests library and the BeautifulSoup library for static documents, and the data which can be dynamically generated by JavaScript or returned by the request back end can be generated for dynamic documents, and at the moment, the browser behavior or the user request can be simulated and the dynamically generated content can be grabbed by calling a specific request simulation component, such as Selenium.
It should be noted that when the page document is crawled, the page document needs to be divided into a static document and a dynamic document, and the dividing means can be various, for example, the page document can be crawled for multiple times according to a certain time interval, and whether the page document has the same part or not is compared with each time; if the identical part and the part with the difference exist at the same time, the page document of the current page is indicated to have the static document and the dynamic document at the same time, the identical part in the page document is classified into the static document, the part with the difference is determined to be the dynamic document, and the subsequent document detection is respectively carried out so as to detect whether the page document is abnormal or not in real time. If the page text is the same at each time, indicating that the page text is a static text; if the appearance positions of the different parts of the page document are different and the content is continuously changed, the page document is indicated to be a dynamic document.
Step 102, calling a request simulation component to send a behavior request to a document interface associated with a dynamic document, and carrying out language detection according to the returned request document to generate a first detection result;
The document interface refers to a technical interface, which allows a background server or system to generate and return dynamic document contents, such as an API interface, an SDK/library, or front-end JavaScript, in real time according to preset rules, behavior requests, or environment variables.
The request simulation component refers to a component capable of simulating operations of a user or an upstream module in a browser, such as clicking, inputting, scrolling, or the like, to implement request simulation for a dynamic web page. Such as Selenium.
In the embodiment of the invention, the behavior request corresponding to the dynamic document is generated by calling the request simulation component and is sent to the document interface so as to receive the request document returned by the document interface, and the first detection result is generated by detecting the data language in the request document and carrying out matching detection with the expected language of the document so as to judge whether the dynamic document in the page document is abnormal or not.
In one example of the invention, step 102 may comprise the sub-steps of:
calling a request simulation component to send a behavior request to a document interface associated with the dynamic document;
when a request document returned by the document interface is received, comparing the data language of the request document with the expected language of the webpage;
if the data language is the same as the expected language of the webpage, detecting passing of the webpage as a first detection result;
if the data language is different from the expected language of the webpage, detecting that the data is not passed as a first detection result.
In this embodiment, the user behavior such as clicking a link, filling in a form or selecting a drop-down menu item may be simulated by calling Selenium a script, so as to generate a behavior request to a document interface associated with a dynamic document, where the document interface returns a corresponding request document to the Web terminal. The Web terminal can extract the data languages from the dynamic document through Selenium scripts and compare the data languages with the expected languages of the webpage preset on the page. If the languages are the same, the detection is indicated to pass, and the detection result is used as a first detection result, so that the subsequent operation can be continued; if the language is different, the dynamic document generated by the document interface is abnormal, and the condition that the dynamic document does not pass through is detected as a first detection result, and the subsequent operation is waited to determine the path of the document interface and record.
Step 103, detecting the static document according to the detection requirement information to generate a second detection result;
The detection requirement information refers to information for specifying detection contents of the static document, and may include, but is not limited to, accuracy detection requirements and language detection requirements, and specific requirements may be represented by fields or identification forms. The information may be generated by the Web side in response to external user input.
And when the dynamic document is detected, the detection content of the static document can be determined according to the detection requirement information, so that the static document is detected according to the detection content, and a second detection result is generated to judge whether the static document is abnormal or not.
In one example of the invention, step 103 may comprise the following sub-steps S11-S14:
s11, if the accuracy detection requirement does not exist in the detection requirement information, acquiring a document language corresponding to the static document;
In this embodiment, when the detection requirement information is received, the detection requirement information is parsed to determine whether a field or a mark represented by the accuracy detection requirement exists in the detection requirement information, if the accuracy detection requirement does not exist, it indicates that only the detection of the language of the document is required at this time, and note or comment kinds of documents corresponding to the static document can be obtained as the basis of the subsequent detection data.
S12, comparing the language of the case with the expected language of the webpage, and generating a second detection result according to the comparison result;
further, S12 may comprise the sub-steps of:
the note or comment kinds of the compared texts and the expected languages of the webpage;
if the language of the document is the same as the expected language of the webpage, detecting the passing of the document as a second detection result;
if the language of the document is different from the expected language of the web page, detecting that the document does not pass through the webpage as a second detection result.
In this embodiment, after the language of the document is obtained, detecting and judging by comparing note or comment languages of the document with the expected language of the web page, the specific detecting means may use the langdetect or langid library commonly used by python to detect, or further use an online api such as Google TRANSLATE API to test, and put the crawled static document into a text box to detect, thereby judging whether note or comment languages of the document are the same as the expected language of the web page or not as the second detecting result.
S13, if accuracy detection requirements exist in the detection requirement information, acquiring a static configuration file corresponding to the static document;
In another example, if there is an accuracy detection requirement in the detection requirement information, this indicates that accuracy detection is required for the static document. Under the use scene of multiple languages, multiple translations can exist for static texts with the same meaning in the same language at the same time, and at the moment, translation contents are directly compared manually, so that the workload is large, and the detection efficiency is low. Therefore, the static configuration file corresponding to the static document can be obtained as comparison content.
The static configuration file refers to a translation configuration file corresponding to the static document, and can include a unique identifier of a configuration item, source text, translation text, notes, a format, conditional translation, a variable and other corresponding codes, and the translation configuration file can be in a format required by JSON, YAML or any item. The generation process can firstly provide translation contrast of words in a form mode by a translation team of pages, convert the form into configuration files of different languages through developers or machine learning models, and write the configuration files into configurations of different static documents.
S14, generating a second detection result according to the comparison result of the static configuration file and the preset translation table.
Further, S14 may include the sub-steps of:
comparing the static configuration file with a preset translation table;
if the static configuration file has codes different from the translation table, detecting that the static configuration file fails to pass as a second detection result;
If the static configuration file does not have codes different from the translation table, detecting passing as a second detection result.
In this embodiment, the Web terminal may access, through a script, a translation table (such as Excel or CSV) provided by a translator, and a static configuration file deployed by a static document. Further, reading the translation table using the library to parse each column in the table to determine a unique identifier, source text, and translation text; at the same time, the library is used to read all static configuration files to parse the file structure. Traversing each row of the translation table (namely, each translation item), searching a corresponding translation code in the static configuration file by using a unique identifier of each translation item, and if no code different from the translation item exists, judging that the detection is passed; if there is a different code in the static configuration file, such as a unique identifier, source text or translation text, in the translation table, it indicates that there is a problem with the translation at this time, so as to detect that it has not passed as the second detection result. Any inconsistent position is found out rapidly through a form comparison mode, so that the detection efficiency is improved, and the bug position with error configuration is determined effectively.
And 104, if the first detection result or the second detection result accords with the abnormal condition, generating and outputting abnormal indication information corresponding to the page document.
In one example of the invention, the abnormal condition is that the detection fails; step 104 may include the sub-steps of:
If the first detection result is that the detection is failed, positioning interface path information of the document interface, generating abnormal indication information corresponding to the page document in a form of a table, and outputting the abnormal indication information;
If the second detection result is that the detection is not passed, element screenshot is positioned from the static file or the code position is positioned from the static configuration file, and the abnormality indication information corresponding to the page file is generated in a table form and output.
In the embodiment of the invention, if the first detection result is that the detection fails, it indicates that the language exception exists in the dynamic document returned by the document interface, which may be due to an error in calling the interface or that the document interface cannot provide the dynamic document of the language. In order to further eliminate abnormal conditions, interface paths corresponding to the document interfaces can be positioned, interface path information, such as contents of a basic URL, a version number, a resource name, a resource identifier and the like, is acquired in a form of table statistics, and abnormal indication information corresponding to the page document is generated and output.
If the second detection result is that the detection fails, it indicates that the text note or comment of the static text or the translation accuracy is abnormal at the moment, and the specific detection type of the second detection result can be judged according to the analysis result of the detection requirement information. If the second detection result is that the text note or comment types of detection fails, element screenshots at abnormal positions of all the texts can be positioned from the static texts, and all the element screenshots are intensively output in an Excel statistical mode so as to inform technicians of the positions of the static texts needing to be modified. If the second detection result is that the accuracy check fails, positioning the abnormal translation code position from the static configuration file according to the comparison result of the static configuration file and the translation table, counting the code positions into the table, and generating and outputting abnormal indication information.
Through the output of the table-form abnormality indication information, a detailed detection report can be automatically generated, including the detected language, the abnormality document, the code position, the element screenshot and other information, so that further analysis and decision are facilitated, and the language detection model and algorithm are continuously optimized and improved through data feedback and analysis, and the system performance is improved.
In addition, in another example of the present invention, when the first detection result or the second detection result is that the detection fails, there is another generation mode of the abnormal indication information, specifically, the method locates to the problem file without distinguishing the static file or the dynamic file, and counts the position or the screenshot of the problem file; judging the development terminal to which the problem file belongs, for example, if the problem file belongs to the front-end file, indicating that the problem file belongs to the static file, directly returning to the corresponding position screenshot; if the problem file belongs to the back-end data, indicating that the problem file belongs to the dynamic file, positioning the corresponding interface information and returning.
In another example of the present invention, the method further comprises:
if the first detection result and the second detection result do not meet the abnormal condition, the step of responding to the detection request and crawling the page document is skipped.
In this embodiment, if the first detection result and the second detection result both reflect that the page document passes the detection, it indicates that the language of the page document is normal and the accuracy is normal at this time, step 101 may be skipped to start a new round of page document detection.
In addition, if it is determined that neither the first detection result nor the second detection result meets the abnormal condition, no processing may be performed, and a new page document crawling request may be waited to trigger the flow of steps 101 to 104.
It should be noted that, in this embodiment, the document crawling is divided into crawling of static documents and dynamic documents, and the detection of various web page documents includes URL detection and API detection, so as to detect web addresses and call interfaces of web page documents in different languages respectively, and determine whether translation of the web page documents is abnormal. If the abnormal condition exists, the abnormal indication information can be obtained by screenshot acquisition and/or interface positioning of the webpage document, so that technicians can repair the abnormal condition of the webpage document under various languages conveniently.
Referring to fig. 2, fig. 2 shows an overall detection flowchart of a multi-language detection method for a Web document in an embodiment of the invention.
In this embodiment, crawling of the webpage document is performed in response to the detection request; judging whether the crawled webpage document is a static document or not; if the webpage file is not the static file, judging that the webpage file is the dynamic file, simulating a file interface corresponding to the request access, and determining the returned data language; judging whether the data language is consistent with the expected language of the webpage document, if so, not processing, otherwise, recording the interface path of the document interface, performing Excel statistics, and returning corresponding abnormal indication information to the Web end for display; if the crawled webpage document is judged to be a static document, the static document can be further detected, and whether the accuracy of the detected document is needed is judged; if the accuracy of the text is required to be detected, comparing the static configuration file of the static text with the translation table, judging whether the static configuration file and the translation table are consistent, if so, not processing, otherwise, positioning a specific code position from the static configuration file, performing Excel statistics, and returning corresponding abnormal indication information to the Web end for display; if the accuracy of the static file is not required to be detected, directly determining the file language of the static file; and judging whether the note or comment languages are consistent with the expected language, if so, not processing, otherwise, positioning element screenshot of the language or the whole static language in different languages, counting by Excel, and returning corresponding abnormal indication information to the Web terminal for display.
In the embodiment of the invention, the Web end responds to the detection request to crawl the page document; the page document comprises a static document and/or a dynamic document; calling a request simulation component to send a behavior request to a document interface associated with a dynamic document, and carrying out language detection according to the returned request document to generate a first detection result; detecting the static document according to the detection requirement information to generate a second detection result; if the first detection result or the second detection result accords with the abnormal condition, generating and outputting abnormal indication information corresponding to the page document. Therefore, by crawling the page documents in real time and respectively carrying out different detection according to the types of the page documents, the detection efficiency is effectively improved under the condition of reducing manual intervention, and abnormal documents in different languages can be accurately found.
Referring to fig. 3, fig. 3 is a block diagram illustrating a multi-language detection device for a Web document according to an embodiment of the invention.
The embodiment of the invention provides a multi-language detection device of a Web terminal document, which is applied to a Web terminal and comprises:
a document crawling module 301, configured to crawl a page document in response to a detection request; the page document comprises a static document and/or a dynamic document;
The dynamic document detection module 302 is configured to invoke the request simulation component to send a behavior request to a document interface associated with a dynamic document, and perform language detection according to the returned request document, so as to generate a first detection result;
the static document detection module 303 is configured to detect the static document according to the detection requirement information, and generate a second detection result;
The abnormal information output module 304 is configured to generate and output abnormal indication information corresponding to the page document if the first detection result or the second detection result meets an abnormal condition.
Optionally, the dynamic document detection module 302 is specifically configured to:
calling a request simulation component to send a behavior request to a document interface associated with the dynamic document;
when a request document returned by the document interface is received, comparing the data language of the request document with the expected language of the webpage;
if the data language is the same as the expected language of the webpage, detecting passing of the webpage as a first detection result;
if the data language is different from the expected language of the webpage, detecting that the data is not passed as a first detection result.
Optionally, the static document detection module 303 includes:
The text note or comment acquisition submodules are used for acquiring text languages corresponding to the static text if the accuracy detection requirement does not exist in the detection requirement information;
the language comparison sub-module is used for comparing note or comment languages of the text with the expected languages of the webpage and generating a second detection result according to the comparison result;
The configuration file acquisition sub-module is used for acquiring a static configuration file corresponding to the static document if accuracy detection requirements exist in the detection requirement information;
and the configuration file comparison sub-module is used for generating a second detection result according to the comparison result of the static configuration file and the preset translation table.
Optionally, the language comparison submodule is specifically configured to:
the note or comment kinds of the compared texts and the expected languages of the webpage;
if the language of the document is the same as the expected language of the webpage, detecting the passing of the document as a second detection result;
if the language of the document is different from the expected language of the web page, detecting that the document does not pass through the webpage as a second detection result.
Optionally, the sub-module is specifically configured to:
comparing the static configuration file with a preset translation table;
if the static configuration file has codes different from the translation table, detecting that the static configuration file fails to pass as a second detection result;
If the static configuration file does not have codes different from the translation table, detecting passing as a second detection result.
Optionally, the abnormal condition is a detection failure; the anomaly information output module 304 is specifically configured to:
If the first detection result is that the detection is failed, positioning interface path information of the document interface, generating abnormal indication information corresponding to the page document in a form of a table, and outputting the abnormal indication information;
If the second detection result is that the detection is not passed, element screenshot is positioned from the static file or the code position is positioned from the static configuration file, and the abnormality indication information corresponding to the page file is generated in a table form and output.
Optionally, the apparatus further comprises:
And the circulation detection module is used for jumping to execute the step of responding to the detection request and crawling the page document if the first detection result and the second detection result do not accord with the abnormal condition.
The embodiment of the invention provides an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the computer program when executed by the processor causes the processor to execute the steps of the multi-language detection method of the Web end document according to any embodiment of the invention.
An embodiment of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed, implements a multi-language detection method for a Web-side document according to any embodiment of the present invention.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, modules and sub-modules described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.
The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a storage medium, comprising several instructions for causing an electronic device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.