[go: up one dir, main page]

CN120372117A - Page processing method, page processing device, electronic device, storage medium and program product - Google Patents

Page processing method, page processing device, electronic device, storage medium and program product

Info

Publication number
CN120372117A
CN120372117A CN202510571820.4A CN202510571820A CN120372117A CN 120372117 A CN120372117 A CN 120372117A CN 202510571820 A CN202510571820 A CN 202510571820A CN 120372117 A CN120372117 A CN 120372117A
Authority
CN
China
Prior art keywords
page
word
accessed
entity word
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510571820.4A
Other languages
Chinese (zh)
Inventor
高文灵
荣伟伟
袁闽
张思龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202510571820.4A priority Critical patent/CN120372117A/en
Publication of CN120372117A publication Critical patent/CN120372117A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The disclosure provides a page processing method, a page processing device, electronic equipment, a storage medium and a program product, and relates to the technical field of computers, in particular to the field of page processing. The method comprises the steps of responding to a page access request, sending address information of a page to be accessed to a server to obtain entity word data of the page to be accessed returned by the server, positioning at least one entity word from the page to be accessed based on the entity word data, and carrying out clickable processing on the entity word to obtain a target page.

Description

Page processing method, page processing device, electronic device, storage medium and program product
Technical Field
The disclosure relates to the field of computer technology, in particular to the field of page processing, and specifically provides a page processing method, a device, electronic equipment, a storage medium and a program product.
Background
Along with the development of mobile intelligent equipment and internet technology, the mobile intelligent equipment is utilized to gradually become a mainstream mode of acquiring information and reading through the internet, and in the reading process, a user can select keywords and further search through a manual word segmentation mode according to actual requirements.
However, since the size of the mobile smart device is limited, the above manner of selecting keywords requires a high level of fineness, and it is difficult for the user to accurately complete the selection of keywords at one time, and manual operations are often required to modify the selected content to ensure that the keywords are accurately selected. This results in complex operations of the user for the second search on the page, higher operation thresholds, and poor user experience.
Disclosure of Invention
The disclosure provides a page processing method, a page processing device, an electronic device, a storage medium and a program product.
According to one aspect of the disclosure, a page processing method is provided, which comprises the steps of responding to a page access request, sending address information of a page to be accessed to a server to obtain entity word data of the page to be accessed returned by the server, positioning at least one entity word from the page to be accessed based on the entity word data, and carrying out clickable processing on the entity word to obtain a target page.
According to another aspect of the disclosure, a page processing method is provided, which includes determining entity word data of a page to be accessed based on address information of the page to be accessed in response to receiving the address information of the page to be accessed from a client, and sending the entity word data of the page to be accessed to the client, so that the client can locate at least one entity word from the page to be accessed based on the entity word data, and clickable processing is performed on the entity word to obtain a target page.
According to another aspect of the disclosure, a page processing device is provided, which comprises an information sending module, an entity word positioning module and an entity word processing module, wherein the information sending module is used for responding to a page access request and sending address information of a page to be accessed to a server side so as to acquire entity word data of the page to be accessed returned by the server side, the entity word positioning module is used for positioning at least one entity word from the page to be accessed based on the entity word data, and the entity word processing module is used for carrying out clickable processing on the entity word to obtain a target page.
According to another aspect of the disclosure, a page processing device is provided, which comprises an entity word determining module, and an entity word sending module, wherein the entity word determining module is used for determining entity word data of a page to be accessed based on address information of the page to be accessed in response to receiving the address information of the page to be accessed from a client, and the entity word sending module is used for sending the entity word data of the page to be accessed to the client, so that the client can obtain at least one entity word from the page to be accessed in a positioning mode based on the entity word data, and clickable processing is carried out on the entity word to obtain a target page.
According to another aspect of the present disclosure, there is provided an electronic device comprising at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method as described above.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 schematically illustrates an exemplary system architecture to which page processing methods and apparatus may be applied, according to embodiments of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a page processing method according to an embodiment of the disclosure;
fig. 3A schematically illustrates a page display style and page text of a server side of a page processing method according to an embodiment of the present disclosure;
FIG. 3B schematically illustrates a page display style and page text of a client of a page processing method according to an embodiment of the present disclosure;
Fig. 3C schematically illustrates a page display style and page text of a client of a page processing method according to another embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of a page processing method according to another embodiment of the present disclosure;
FIG. 5 schematically illustrates a block diagram of a page processing apparatus according to an embodiment of the disclosure;
FIG. 6 schematically shows a block diagram of a page processing apparatus according to another embodiment of the present disclosure, and
Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
FIG. 1 schematically illustrates an exemplary system architecture to which page processing methods and apparatus may be applied, according to embodiments of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the page processing method and apparatus may be applied may include a terminal device, but the terminal device may implement the page processing method and apparatus provided by the embodiments of the present disclosure without interaction with a server.
As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 101, 102, 103, such as a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and/or social platform software, etc. (as examples only).
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for content browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that, the page processing method provided by the embodiments of the present disclosure may be generally executed by the terminal device 101, 102, or 103. Accordingly, the page processing apparatus provided by the embodiments of the present disclosure may also be provided in the terminal device 101, 102, or 103.
Or the page processing method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the page processing apparatus provided in the embodiments of the present disclosure may be generally disposed in the server 105. The page processing method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the page processing apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
For example, when a user reads an electronic book online, the server 105 may obtain page text that the user needs to access through the terminal devices 101, 102, 103, match the page text in a database, and if the matching is successful, obtain an entity word corresponding to the page text from the database, and send the entity word to the terminal devices 101, 102, 103, so that clickable processing is performed on the obtained entity word on the terminal devices 101, 102, 103.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.
In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.
Fig. 2 schematically shows a flow chart of a page processing method according to an embodiment of the present disclosure.
As shown in FIG. 2, the method includes operations S210-S230.
In operation S210, in response to the page access request, address information of the page to be accessed is sent to the server, so as to obtain entity word data of the page to be accessed returned by the server.
According to embodiments of the present disclosure, the page access request may be generated by a user when accessing a page through a search engine or browser and initiated to a client, and may include address information of a page to be accessed that the user needs to access, where the address information may be, for example, a uniform resource locator (Uniform Resource Locator, URL), and may further include a domain name, an internet protocol address (Internet Protocol Address, an IP address), a file path, and the like.
According to the embodiment of the disclosure, a client analyzes a page access request, determines address information of a page to be accessed carried in the page access request, and sends the address information of the page to be accessed to a server.
According to the embodiment of the disclosure, after receiving the address information of the page to be accessed, the server side may determine the page to be accessed based on the address information, further determine entity word data of the page to be accessed, and after determining the entity word data, the server side may send the determined entity word data of the page to be accessed to the client side, so that the client side performs subsequent processing based on the entity word data, where the entity word data may include entity words in the page to be accessed, where a user may have a requirement of deep search.
In operation S220, at least one entity word is located from the page to be accessed based on the entity word data.
According to the embodiment of the disclosure, after receiving the entity word data sent by the server, the client can analyze the page to be accessed, determine the page text of the page to be accessed, match the page text according to the entity word data, and locate the entity word matched with the entity word data from the page text.
In operation S230, clickable processing is performed on the entity word to obtain a target page.
According to the embodiment of the disclosure, through clickable processing on the entity words, the common text words in the page to be accessed can be converted into elements which can be clicked by a user. After the clickable processing is completed, the user can perform clicking operation on the processed entity word, and after the clicking operation, the user can make a response of jumping to a link related to the entity word, directly displaying explanation of the entity word and the like, so that the user can further understand the entity word based on the content in the link or the content of the explanation.
According to the embodiment of the disclosure, clickable processing of the entity word can be completed by adding a hyperlink to the entity word, binding a JavaScript event for the entity word, and the like, wherein when clickable processing of the entity word is completed by the hyperlink, after clicking operation is performed on the processed entity word by a user, the user can jump to a search page corresponding to an address in the hyperlink, wherein the search page can be obtained by searching in a search engine by taking the entity word as a keyword. Under the condition that clickable processing of the entity word is completed by binding the JavaScript event for the entity word, after clicking operation is carried out on the processed entity word by a user, contents such as explanation of the entity word and the like can be displayed in a popup window mode, wherein the explanation contents can be obtained by searching in a search engine by taking the entity word as a keyword and extracting from a page obtained by searching.
According to the embodiment of the disclosure, the entity word after clickable processing can be further subjected to saliency enhancement processing, so that a user can quickly and accurately find the entity word on a target page, wherein the saliency enhancement processing can comprise text highlighting processing, text style transformation processing and the like, and the text style comprises text colors, whether text is thickened, whether text is underlined, text fonts and the like.
According to the embodiment of the disclosure, when a user browses a page through a client, address information of the page to be accessed is sent to a server, entity word data returned by the page to be accessed is received, clickable processing is carried out on entity words in the page to be accessed based on the entity word data at the client to obtain a target page, and the target page is displayed to the user. Because the entity word data is a word with higher predetermined clicking/searching frequency for the user, when the user needs to search for the entity word in the process of browsing the target page, the clickable entity word can be directly clicked, and the relevant content of the entity word can be displayed for the user, so that the dynamic conversion of the page is realized, the operation threshold of the user for carrying out secondary searching on the page is obviously reduced, and the use experience of the user is improved.
According to the embodiment of the disclosure, entity word data comprises at least one word element and positioning information of the at least one word element, wherein the at least one word element corresponds to the at least one entity word, the at least one entity word is positioned from a page to be accessed based on the entity word data, the method comprises the steps of determining a target text part from page text of the page to be accessed based on the positioning information of the word element, and the entity word corresponding to the word element is positioned from the target text part based on the word element.
According to an embodiment of the disclosure, the word element may be at least one entity word determined according to the page text after the server analyzes the page to be accessed and determines the page text of the page to be accessed.
According to the embodiment of the disclosure, since a plurality of identical word elements can exist in the page text, in this case, a plurality of positioning results can be obtained only by positioning the word elements from the page to be accessed according to the word elements, and it is difficult to judge the word elements specifically corresponding to the entity word data. Therefore, the positioning information of the word elements can be set in the entity word data and is used for assisting in uniquely determining the word elements corresponding to the entity word data from the page to be accessed.
For example, although the plurality of word elements in the page text are the same, the context information of each of the plurality of word elements may be different, and thus, the positioning information of the word elements may include the word elements and the context information of the word elements.
According to the embodiments of the present disclosure, for some special word elements that generally occur in fixed collocations, if the context information of the word element is limited to only the phrase in which the word element is located, there still exists a problem that multiple positioning results are obtained, which is difficult to distinguish in detail and accurately position. Therefore, the range of the context information can select different granularities of phrases, whole sentences, text segments and the like where the word elements are located according to actual situations.
Taking the whole sentence where the word element is located as the granularity of the context information as an example, performing text matching in the page text of the page to be accessed based on the context information of the word element to obtain a target sentence corresponding to the context information of the word element, and obtaining a target text part.
According to the embodiment of the disclosure, since the positioning of the entity words needs to be accurate and unique, text matching is performed in an accurate matching mode, namely, information completely consistent with the context information of the word elements is searched in the page text and used as a target sentence, and the target sentence is used as a target text part.
According to the embodiment of the disclosure, based on the word elements, the entity words corresponding to the word elements can be located from the target text part in an accurate matching mode.
According to the embodiment of the disclosure, according to the positioning information of the element, the target text part can be determined from the page text of the page to be accessed according to the context information of the word element, so that the subsequent positioning and matching range is reduced, and the matching efficiency is improved. Based on the word elements, the entity words corresponding to the word elements can be accurately obtained from the target text part in a matching way, and the matching accuracy is ensured.
According to the embodiment of the disclosure, the positioning information comprises a page display style of a server side and initial positioning characteristics of word elements, wherein the determining of the target text part from the page text of the page to be accessed based on the positioning information of the word elements comprises adjusting the initial positioning characteristics of the word elements based on the page display style of the server side and the page display style of a client side to obtain target positioning characteristics, and the matching of the page text of the page to be accessed based on the target positioning characteristics to obtain the target text part.
According to embodiments of the present disclosure, the initial positioning feature may be used to represent a location, such as line B of the a-th segment, where a word element is located on a page to be accessed in a page display style of the server.
According to the embodiment of the disclosure, in the case that the page display style of the server and the page display style of the client are the same, the initial positioning feature may be taken as the target positioning feature. In the case where the server-side page display style and the client-side page display style are different, a position of a word element in a position corresponding to the initial positioning feature in the page display style of the client-side may be determined based on a difference between the server-side page display style and the client-side page display style, and the position may be determined as the target positioning feature.
For example, when the page display style of the server side indicates that each line includes 40 characters, the page display style of the client side indicates that each line includes 20 characters, and the initial positioning feature indicates that the position of the word element on the page to be accessed is the 3 rd segment and the 4 th line, the position of the word element on the page to be accessed can be calculated under the page display style of the client side, wherein the segment number does not need to be changed, the line number can be determined according to the number of characters included in each line under different page display styles, and therefore, it can be determined that the position of the target positioning feature indicates that the word element on the page to be accessed of the client side is the 7 rd to 8 th segment.
According to the embodiment of the disclosure, after the target positioning feature is determined, the text located at the position corresponding to the target positioning feature can be determined as a target text part from the page text in the page display mode of the client.
According to the embodiment of the disclosure, according to the page display style of the client, the page display style of the server and the initial positioning characteristics of the word elements, the target positioning characteristics of the word elements in the client can be determined, so that the word elements can be rapidly positioned in the page text of the page to be accessed based on the target positioning characteristics, and the target text parts can be obtained through matching. Because the initial positioning feature and the target positioning feature can be expressed in the form of segment numbers and line numbers, a relatively small target text part can be positioned according to the target positioning feature so as to carry out further keyword matching in the target text part, and therefore, the efficiency and the accuracy of the matching process are improved.
According to the embodiment of the disclosure, based on word elements, entity words corresponding to the word elements are located in a target text portion, and the method comprises the steps of carrying out keyword matching on the target text portion by taking the word elements as keywords to obtain matching results, and locating the entity words corresponding to the word elements in the target text portion based on the matching results.
According to embodiments of the present disclosure, the matching result may be used to represent where the word element is located in the target text portion.
According to the embodiment of the disclosure, in the case that the matching result represents that more than one entity word corresponding to the word element is included in the target text portion, the entity word data may further include sequence numbers of the entity word to be located in a plurality of word elements of the target text portion, so that the entity word is accurately determined from the plurality of entity words obtained by accurate matching according to the sequence numbers.
According to the embodiment of the disclosure, from the target text part, further accurate matching is performed according to the word elements, so that the entity words can be accurately positioned, and the accuracy of determining the entity words is improved.
According to the embodiment of the disclosure, at least one word element is configured to be arranged according to the text content sequence of a page to be accessed, wherein the determining of the target text part from the page text of the page to be accessed is based on the positioning information of the word element, and comprises the steps of deleting the page text based on the position of an entity word corresponding to the i-1 word element in the page text for the i-th word element to obtain the deleted page text, wherein i is an integer larger than 1, and determining the target text part from the deleted page text based on the positioning information of the i-th word element.
According to an embodiment of the present disclosure, the text content of the page text of the page to be accessed includes at least one entity word, wherein each entity word may include a character or a word or a phrase. From the page text, a text content order of the page to be accessed may be determined, and from the text content order, an order relationship between at least one text content included in the page text may be determined. Since the at least one word element corresponds to the at least one entity word, the at least one entity word may be converted into word elements, respectively, and the at least one word element may be arranged in a sequential relationship between the at least one text content.
According to the embodiment of the present disclosure, since at least one word element is configured to be arranged based on the text content order, and other word elements following a certain word element are not included in the text content preceding the certain word element, the text content preceding the preceding word element of the word element can be deleted when the matching of the word element is performed.
According to the embodiment of the disclosure, for the ith word element, the i-1 word element located before the ith word element may be deleted, and the target text portion may be determined from the deleted page text. According to the positioning information of the ith word element, the entity word corresponding to the ith word element can be positioned from the deleted page text, so that a target text part is determined.
According to the embodiment of the disclosure, in the process of positioning a current word element in at least one word element arranged in sequence based on the text content of a page to be accessed, another word element before the current word element can be determined, the page text before the another word element in the text content is pruned to obtain a pruned page text, and positioning and determining a target text part in the pruned page text are performed according to the current word element. According to the analysis, the accuracy of the matching process is not affected by the deleting, so that the size of a text to be matched can be reduced on the premise of ensuring the matching accuracy, and the efficiency and the accuracy of the determining process of the target text part are improved.
According to the embodiment of the disclosure, clickable processing is performed on entity words to obtain a target page, and the clickable processing comprises the steps of obtaining address information of a search page aiming at the entity words, generating page tags based on the address information, and modifying the entity words by using the page tags to perform clickable processing on the entity words to obtain the target page.
According to the embodiment of the disclosure, the search page aiming at the entity word can be determined according to the relevance of the search result after the entity word is input into a search engine for searching. From a plurality of candidate pages included in the search result, a page with higher correlation can be selected as a search page, and address information of the search page is acquired.
According to embodiments of the present disclosure, page tags are generated based on address information, wherein the page tags may include saliency enhancement processing on entity words, e.g., the page tags may be underlined, highlighted areas, etc., with hyperlinks added thereto that may jump to the address information.
According to the embodiment of the disclosure, the entity word is modified by using the page tag, and the entity word can be modified by using a saliency enhancement processing mode corresponding to the page tag, so that a target page is obtained, for example, in the case that the page tag is an underlined entity word, the modified entity word is the entity word added with the underline. And because the hyperlink is added in the page tag, after the entity word is modified by the page tag, clicking the entity word can jump to the search page through the hyperlink, so that clickable processing is carried out on the entity word.
According to the embodiment of the disclosure, page tags are generated based on address information of a search page for entity words, the entity words are modified by the page tags, and clickable processing of the entity words is completed. Therefore, when clicking the entity word after clickable processing, triggering the operation corresponding to the page label, and jumping to the search page aiming at the entity word, so that the clickable processing accuracy can be ensured.
Fig. 3A schematically illustrates a page display style and page text of a server side of a page processing method according to an embodiment of the present disclosure.
As shown in fig. 3A, the page includes a title and a body, and the body includes two text segments. The entity word data of the page to be accessed returned by the server side comprises an entity word 1 in the title, an entity word 2 in the first section of the text, an entity word 3 and an entity word 4 in the second section of the text. According to the page display style of the server, it can be determined that the initial positioning feature indicates that the entity word 1 is located in the title, the entity word 2 and the entity word 3 are located in the first line of the first text segment, and the entity word 4 is located in the first line of the second text segment.
Fig. 3B schematically illustrates a page display style and page text of a client of the page processing method according to an embodiment of the present disclosure.
As shown in fig. 3B, the page display style of the client in this embodiment is the same as the page display style of the server in fig. 3A, so the initial positioning feature may be determined as the target positioning feature, according to the target positioning feature, the title may be determined to include the entity word 1 from the page text, the first line of the first text segment includes the entity word 2 and the entity word 3, and the first line of the second text segment includes the entity word 4.
After determining at least one entity word, respectively carrying out clickable processing on the entity word, wherein the entity word after clickable processing is newly added with an underline and a query identifier compared with the entity word before clickable processing, and is used for prompting a user that the entity word can be clicked to further query.
Fig. 3C schematically illustrates a page display style and page text of a client of a page processing method according to another embodiment of the present disclosure.
As shown in fig. 3C, in this embodiment, the page display style of the client is different from the page display style of the server in fig. 3A, and in the page display style of the client, the number of characters that each line can accommodate is half of the number of words that each line can accommodate in the page display style of the server, so, according to the page display style of the client, the page display style of the server, and the initial positioning feature, the target positioning feature can be determined, where the target positioning feature indicates that the title includes the entity word 1, the first-second lines of the first text segment include the entity word 2 and the entity word 3, and the first-second lines of the second text segment include the entity word 4.
After determining at least one entity word, respectively carrying out clickable processing on the entity word, wherein the entity word after clickable processing is newly added with an underline and a query identifier compared with the entity word before clickable processing, and is used for prompting a user that the entity word can be clicked to further query.
According to the embodiment of the disclosure, the page processing method further comprises the step of responding to the click operation of the target entity word in the target page, and displaying the search page taking the target entity word as the search keyword.
According to the embodiment of the disclosure, after clickable processing of the entity word is completed, a user can click the entity word on the target page, after the user clicks the target entity word on the target page, the client can trigger the hyperlink on the target entity word based on the clicking operation and the target entity word aimed by the clicking operation, and jump to the search page corresponding to the address information of the hyperlink, namely the search page taking the target entity word as the search keyword.
According to the embodiment of the disclosure, in response to clicking operation on the target entity word, the user jumps to the search page in a hyperlink and other modes to display the detailed information of the target entity word, so that the operation of searching the entity word in the target page is simplified, and the convenience and accuracy of the operation are improved.
Fig. 4 schematically illustrates a flow chart of a page processing method according to another embodiment of the present disclosure.
As shown in FIG. 4, the method includes operations S410-S420.
In operation S410, entity word data of a page to be accessed is determined based on address information of the page to be accessed in response to receiving the address information of the page to be accessed from the client.
According to the embodiment of the disclosure, after a user initiates an access operation to a page to be accessed through a client, the server may receive address information of the page to be accessed.
According to the embodiment of the disclosure, the server side can determine whether the page to be accessed is a pre-processed page according to the address information of the page to be accessed. If it is determined that the entity word data corresponding to the address information exists in the database of the server side, the entity word data can be directly read from the database. If it is determined that the database of the server does not contain the entity word data corresponding to the address information, the server can process the address information in real time to obtain the entity word data of the page to be accessed.
In operation S420, entity word data of the page to be accessed is sent to the client, so that the client locates at least one entity word from the page to be accessed based on the entity word data, and clickable processing is performed on the entity word to obtain the target page.
According to the embodiment of the disclosure, after receiving the address information of the page to be accessed, the server side can determine entity word data according to the address information and send the entity word data of the page to be accessed to the client side, so that the client side can process the page to be accessed into a target page comprising clickable entity words. The server is utilized to execute the process of determining the entity word data, so that the processing load of the client can be reduced, and the performance of the client is ensured.
According to the embodiment of the disclosure, the page processing method further comprises the steps of obtaining a plurality of pages, extracting feature data based on page texts of the pages to obtain entity word data of the pages, and storing the entity word data of the pages in a database by taking address information of the pages as an index.
According to the embodiment of the present disclosure, a plurality of pages with the highest access frequency may be selected according to the respective access frequencies of the pages, and the plurality of pages may be acquired respectively, where the acquiring means is not limited herein as long as the pages can be acquired.
For example, the page can be obtained by directly capturing the page content, the front end code of the page can be obtained by a developer tool, the front end code is analyzed, the page is obtained according to the content in the corresponding tag, the page image can be automatically captured, and the image content is converted into the text by an optical character recognition method.
According to the embodiment of the disclosure, from the page text, the words conforming to the preset part of speech are determined to be the entity words, and the entity words are combined with the positioning information of the entity words in the page text to obtain the entity word data.
According to the embodiment of the disclosure, after the analysis of the page text is completed, the address information of the page is used as an index, and at least one entity word data corresponding to the page is stored in the database, so that whether a record corresponding to the address information exists in the database can be rapidly determined when the inquiry is performed based on the address information of the page to be accessed later.
According to the embodiment of the disclosure, by acquiring a plurality of pages, extracting the feature data of the pages, determining the entity word data of the pages, and storing the address information of the pages and the entity word data into the database, a large amount of data can be obtained in advance. Therefore, when responding to the request of the client, the query matching is directly carried out from the database, the response speed is improved, and the user experience is optimized. In addition, since the page text of the page can be obtained by acquiring the page, the page processing method can be used for determining entity word data for any page on the Internet, so that the application range of the page processing method is enlarged.
According to the embodiment of the disclosure, feature data extraction is performed on page text of a page to obtain entity word data of the page, and the feature data extraction comprises word segmentation processing is performed on the page text of the page to obtain a plurality of word segmentation, at least one word element is obtained by screening from the plurality of word segmentation based on the word part of the word segmentation, positioning information of each of the at least one word element is determined from the page text, and the entity word data of the page is obtained based on the positioning information of each of the at least one word element and the at least one word element.
According to the embodiment of the disclosure, the page text can be segmented according to the proper granularity by utilizing a natural language processing means to obtain a plurality of segmented words. The granularity of the word segmentation can be determined according to a professional dictionary, word frequency statistics, grammar semantic rules and the like.
According to the embodiment of the disclosure, the part of speech of the word is determined, and the word of the preset part of speech is determined to be a word element. The preset parts of speech may include nouns, verbs, etc., and since the parts of speech have practical meanings, the parts of speech have a great influence on semantics, and need to be preserved in the screening process. And stop words with less semantic influence, such as pronouns, prepositions, conjunctions, and the like, do not need to be reserved in the screening process.
According to an embodiment of the present disclosure, after determining at least one word element, respective positioning information of the at least one word element is determined from the page text, wherein the positioning information may be used to represent a position of the at least one word element in the page text.
According to the embodiment of the disclosure, word segmentation processing is performed on the page text, at least one word element meeting the requirements is screened from the obtained plurality of word segmentation according to the part of speech, and the entity word data of the page is further obtained. Through deleting stop words with small influence on semantics, such as pronouns, prepositions, conjunctions, assisted words and the like, the influence of deleted word cutting on semantics can be ensured to be small, the number of word elements can be reduced, and resource waste is reduced.
According to the embodiment of the disclosure, the positioning information comprises context information of word elements, wherein the positioning information of each word element is determined from the page text, the positioning information comprises a target sentence where the word element is located, and the context information of the word element is obtained based on the target sentence.
According to the embodiment of the disclosure, the position of the word element in the page text is determined, and the target sentence where the word element is located is determined according to the text separator which is closest to the front and rear of the word element, wherein the text separator can comprise a sentence end punctuation, a line feed character and the like. After determining the target sentence, the target sentence may be directly used as the context information of the word element.
According to the embodiment of the disclosure, after the target sentence is determined, matching can be performed in the page text based on the target sentence, and if it is determined that there is more than one sentence identical to the target sentence, if the target sentence is directly used as context information, in the subsequent page processing process by using the client, there is a problem that a unique entity word cannot be accurately located.
In this case, expansion may be performed to the front and rear of the target sentence based on the target sentence, for example, a next sentence of the target sentence and the target sentence may be determined, matching may be performed based on the target sentence and the next sentence of the target sentence in the page text, and in the case where it is determined that there is a result of success in matching, the target sentence and the next sentence of the target sentence may be used as the context information of the word element.
According to the embodiment of the disclosure, the context information of the word element is determined according to the target sentence where the word element is located, and the probability of matching repetition in the process of matching the data can be reduced by expanding the granularity of the data, so that the accuracy and pertinence of page processing are improved.
According to the embodiment of the disclosure, the page processing method further comprises the steps of carrying out type recognition on the page to obtain a recognition result, and extracting feature data based on the page text of the page to obtain entity word data of the page when the recognition result indicates that the type of the page is a preset type.
According to the embodiment of the disclosure, the front end code of the page can be analyzed, and the identification result is determined according to the labels existing in the page and the content in the labels. Taking hypertext markup language (HyperText Markup Language, HTML) as an example, HTML source codes of a page can be acquired, and the number of tags in the source codes and the content in the tags can be determined, and when the source codes are mainly tags such as < p >, < div >, the page can be determined to be an article page mainly containing article content, and when the source codes are mainly tags such as < iframe >, < video >, the page can be determined to be a video page mainly containing video content.
The page processing method mainly processes the page text, so the preset type can comprise an article page taking article content as a main body, and the like, in addition, when the identification result shows that the type of the page is a picture page taking a picture as a main body, optical character identification can be performed on the picture, when the text content or the number of characters in the picture is determined to reach the preset number, the result of the optical character identification can be used as the page text, and the page text is processed by the page processing method.
According to the embodiment of the disclosure, the type of the page is identified, and whether the page is subjected to the page processing is determined based on the identification result, so that the page of the type which does not need to be subjected to the page processing can be filtered, the resource waste of the server side is reduced, and the resource utilization rate is improved.
According to the embodiment of the disclosure, determining the entity word data of the page to be accessed based on the address information of the page to be accessed comprises the step of carrying out data retrieval in a database by taking the address information of the page to be accessed as an index so as to obtain the entity word data of the page to be accessed.
According to the embodiment of the disclosure, after extracting the feature data of the page text of the page to obtain the entity word data of the page, the address information and the entity word data are stored in the database in the form of KEY value pairs, so that after the server receives the address information of the page to be accessed sent by the client, data retrieval can be performed in the database based on the address information to determine whether a record taking the address information as a KEY exists in the database.
According to the embodiment of the disclosure, in the case that it is determined that a record using address information as KEY exists in the database, the VALUE corresponding to the record may be read and determined as the entity word data of the page to be accessed.
According to the embodiment of the disclosure, the data is stored and queried in the form of the key value pairs, so that the data query efficiency can be improved, and the accuracy of the query result can be ensured.
According to the embodiment of the disclosure, address information of a page to be accessed is used as an index, data retrieval is performed in a database to obtain entity word data of the page to be accessed, the method comprises the steps of taking the address information of the page to be accessed as the index, performing data retrieval in the database to obtain initial entity word data of the page to be accessed, and performing data processing on the initial entity word data to obtain the entity word data of the page to be accessed.
According to an embodiment of the present disclosure, in the case where a record having address information as KEY exists in the database, the VALUE of the record, that is, the initial entity word data, includes all entity word data in the page text of the page to be accessed. Typically, most entity word data, while affecting semantics, is itself relatively easy to understand and does not require further searching.
In this case, if all entity word data are sent to the client and clickable processing is performed on the entity words, the clickable items in the target page are too many, and a situation of false touch is easy to occur, so that customer experience is affected. In addition, for entity words which do not have the requirement of further searching, even if clickable processing is performed on the entity words, a user seldom actively clicks, and a client can perform a large amount of invalid processing, so that the waste of computing resources is caused.
According to the embodiment of the disclosure, after the initial entity word data is determined, further data processing can be performed on the initial entity word data, and part of entity word data which does not have the requirement of further searching is deleted, so that the entity word data of the page to be accessed is obtained.
According to the embodiment of the disclosure, data processing is performed on initial entity word data of a page to be accessed to obtain the entity word data of the page to be accessed, wherein the entity word data comprises the click rate scores of at least one initial word element included in the initial entity word data, the click rate scores of the initial word element are obtained by performing click rate estimation on the initial word element, at least one target word element is obtained by screening from the at least one initial word element based on the click rate scores of the at least one initial word element, and the entity word data of the page to be accessed is obtained based on the positioning information of the at least one target word element and the at least one target word element.
According to the embodiment of the disclosure, click rate estimation can be performed on at least one initial word element respectively to obtain a probability estimation value of the clicked initial entity word after clickable processing is performed on at least one initial entity word, and the probability estimation value is used as a click rate score of the initial word element.
According to the embodiment of the disclosure, the click rate scores of at least one initial word element are ranked, and a preset word number word element with the largest click rate score is selected as a target word element, wherein all the initial word elements can be directly determined as the target word element when the number of the initial word elements is smaller than the preset word number.
According to the embodiment of the disclosure, at least one element is ranked according to the click rate score, the initial word elements are ranked and selected, target word elements are obtained, and entity word data of a page to be accessed is further obtained. Under the condition that the number of initial word elements is large, the word elements with higher possibility of being clicked are preferentially used as target word elements, so that the number of entity words which can be clicked after page processing based on entity word data is reduced, and the possibility of misoperation is reduced. The click rate score of the clickable entity word is relatively high, so that the utilization rate of the clickable entity word is high, and the utilization rate of the computing resource of the client is improved.
According to the embodiment of the disclosure, the data processing is performed on the initial entity word data of the page to be accessed to obtain the entity word data of the page to be accessed, wherein the method comprises the steps of sorting at least one initial word element included in the initial entity word data based on text content of the page, and obtaining the entity word data of the page based on the sorted at least one word element and the respective positioning information of the sorted at least one word element.
According to an embodiment of the present disclosure, the at least one initial word element is ordered according to its respective positioning information in the page text, as well as the text content. And sequencing at least one word element and the respective positioning information according to the sequence from front to back to obtain the entity word data of the page.
According to the embodiment of the disclosure, at least one initial word element is sequenced, and the entity word data of the page is obtained, so that at least one word element in the entity word data of the page is arranged according to the sequence of the entity word element in the page text, and the matching process of the word element is not influenced after the text content before the word element is deleted when the word element is matched in the positioning process of the page to be accessed.
According to the embodiment of the disclosure, the page processing method further comprises the steps of acquiring page text of the page to be accessed based on address information of the page to be accessed under the condition that entity word data of the page to be accessed cannot be retrieved from the database, and extracting feature data based on the page text of the page to be accessed to obtain the entity word data of the page to be accessed.
According to the embodiment of the disclosure, since the page is added and updated quickly in the internet, there is a possibility that the server side does not pre-process the page to be accessed, which the client side needs to access in advance. In this case, the server database does not include address information of the page to be accessed and entity word data corresponding thereto.
According to the embodiment of the disclosure, under the condition that the database does not include the entity word data of the page to be accessed, after the address information of the page to be accessed is obtained, the page text of the page to be accessed can be obtained in real time according to the address information of the page to be accessed, and the feature data extraction is performed on the page text to obtain the entity word data of the page to be accessed.
According to another embodiment of the disclosure, the client may also send the page text of the page to be accessed to the server together with the address information.
According to the embodiment of the disclosure, after the entity word data of the page to be accessed is obtained, the entity word data of the page to be accessed can be stored in the database by taking the address information of the page to be accessed as an index, so that after a subsequent client initiates an access request for the page to be accessed again, the server can directly obtain the entity word data of the page to be accessed from the database without carrying out real-time processing again, and the response speed is improved.
According to the embodiment of the disclosure, under the condition that the database of the server does not include the entity word data of the page to be accessed, according to the address information of the page to be accessed, analyzing and processing the content in the page to be accessed in real time to obtain the entity word data of the page to be accessed. Therefore, entity word data can be provided for any page to be accessed, and the page processing method is not limited to the page after preprocessing, so that universality and flexibility of the page processing method are improved.
Fig. 5 schematically shows a block diagram of a page processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 5, the page processing apparatus 500 of this embodiment includes an information transmitting module 510, an entity word positioning module 520, and an entity word processing module 530.
The information sending module 510 is configured to send, in response to a page access request, address information of a page to be accessed to a server, so as to obtain entity word data of the page to be accessed returned by the server.
The entity word positioning module 520 is configured to position and obtain at least one entity word from the page to be accessed based on the entity word data.
The entity word processing module 530 is configured to perform clickable processing on the entity word to obtain a target page.
According to an embodiment of the present disclosure, the entity word location module 520 includes a target determination sub-module and an entity word location sub-module.
And the target determining sub-module is used for determining a target text part from the page text of the page to be accessed based on the positioning information of the word elements.
And the entity word positioning sub-module is used for positioning and obtaining the entity word corresponding to the word element from the target text part based on the word element.
According to an embodiment of the present disclosure, the object determination submodule includes a first text matching unit.
The first text matching unit is used for performing text matching in the page text of the page to be accessed based on the context information of the word elements to obtain a target sentence corresponding to the context information of the word elements and obtain a target text part.
According to an embodiment of the present disclosure, the object determination submodule includes a feature adjustment unit and a second text matching unit.
The feature adjustment unit is used for adjusting the initial positioning feature of the word element based on the page display style of the server and the page display style of the client to obtain the target positioning feature.
And the second text matching unit is used for matching and obtaining a target text part from the page text of the page to be accessed based on the target positioning characteristic.
According to an embodiment of the present disclosure, the entity word locating sub-module includes a keyword matching unit and a word locating unit.
And the keyword matching unit is used for performing keyword matching on the target text part by taking the word elements as keywords to obtain a matching result.
And the word positioning unit is used for positioning and obtaining the entity word corresponding to the word element from the target text part based on the matching result.
According to an embodiment of the present disclosure, the object determination submodule includes a text pruning unit and a third text matching unit.
And the text deleting unit is used for deleting the page text according to the position of the entity word corresponding to the ith-1 word element in the page text to obtain the deleted page text, wherein i is an integer greater than 1.
And a third text matching unit for determining a target text portion from the pruned page text based on the positioning information of the ith word element.
According to an embodiment of the present disclosure, the entity word processing module 530 includes an address acquisition sub-module, a tag generation sub-module, and an entity word modification sub-module.
And the address acquisition sub-module is used for acquiring the address information of the search page aiming at the entity word.
And the label generating sub-module is used for generating the page label based on the address information.
And the entity word modifying submodule is used for modifying the entity word by using the page tag so as to perform clickable processing on the entity word and obtain a target page.
According to an embodiment of the present disclosure, the page processing apparatus 500 further includes a page presentation module.
And the page display module is used for responding to the click operation of the target entity word in the target page and displaying the search page taking the target entity word as the search keyword.
Fig. 6 schematically shows a block diagram of a page processing apparatus according to another embodiment of the present disclosure.
As shown in fig. 6, the page processing apparatus 600 of this embodiment includes an entity word determining module 610 and an entity word transmitting module 620.
The entity word determining module 610 is configured to determine entity word data of a page to be accessed based on address information of the page to be accessed in response to receiving the address information of the page to be accessed from the client.
The entity word sending module 620 is configured to send entity word data of a page to be accessed to a client, so that the client locates at least one entity word from the page to be accessed based on the entity word data, and performs clickable processing on the entity word to obtain a target page
According to an embodiment of the present disclosure, the page processing apparatus 600 further includes a page acquisition module, a first data extraction module, and a data storage module.
And the page acquisition module is used for acquiring a plurality of pages.
And the first data extraction module is used for extracting the characteristic data based on the page text of the page to obtain the entity word data of the page.
And the data storage module is used for storing the entity word data of the page in the database by taking the address information of the page as an index.
According to an embodiment of the disclosure, the first data extraction module includes a word segmentation sub-module, a screening sub-module, a positioning sub-module, and a data generation sub-module.
The word segmentation sub-module is used for carrying out word segmentation processing on the page text of the page to obtain a plurality of word segments.
And the screening sub-module is used for screening at least one word element from the plurality of the word fragments based on the part of speech of the word fragments.
And the positioning sub-module is used for determining respective positioning information of at least one word element from the page text.
And the data generation sub-module is used for obtaining the entity word data of the page based on the at least one word element and the respective positioning information of the at least one word element.
According to an embodiment of the present disclosure, the positioning sub-module includes a sentence determination unit and an information determination unit.
And the sentence determining unit is used for determining a target sentence in which the word element is positioned from the page text.
And the information determining unit is used for obtaining the context information of the word elements based on the target sentences.
According to an embodiment of the present disclosure, the page processing apparatus 600 further includes a type identification module and a second data extraction module.
And the type identification module is used for carrying out type identification on the page to obtain an identification result.
And the second data extraction module is used for extracting the characteristic data based on the page text of the page to obtain the entity word data of the page under the condition that the identification result indicates that the type of the page is a preset type.
According to an embodiment of the present disclosure, the entity word determination module 610 includes a data retrieval sub-module.
And the data retrieval sub-module is used for taking the address information of the page to be accessed as an index, and performing data retrieval in the database to obtain the entity word data of the page to be accessed.
According to an embodiment of the present disclosure, the data retrieval sub-module includes a data retrieval unit and a data filtering unit.
And the data retrieval unit is used for carrying out data retrieval in the database by taking the address information of the page to be accessed as an index to obtain the initial entity word data of the page to be accessed.
And the data filtering unit is used for carrying out data processing on the initial entity word data to obtain entity word data of the page to be accessed.
According to an embodiment of the present disclosure, the data filtering unit includes a score calculating subunit, an element screening subunit, and a first data determining subunit.
The score calculating subunit is configured to obtain respective click rate scores of at least one initial word element included in the initial entity word data, where the click rate scores of the initial word elements are obtained by performing click rate estimation on the initial word elements.
And the element screening subunit is used for screening at least one target word element from the at least one initial word element based on the respective click rate scores of the at least one initial word element.
The first data determining subunit is configured to obtain entity word data of the page to be accessed based on the at least one target word element and the respective positioning information of the at least one target word element.
According to an embodiment of the present disclosure, the data filtering unit includes an element ordering subunit and a second data determining subunit.
And the element ordering subunit is used for ordering at least one initial word element included in the initial entity word data based on the text content of the page.
And the second data determining subunit is used for obtaining the entity word data of the page based on the ordered at least one word element and the respective positioning information of the ordered at least one word element.
According to an embodiment of the present disclosure, the page processing apparatus 600 further includes a text acquisition module and a third data extraction module.
The text acquisition module is used for acquiring the page text of the page to be accessed based on the address information of the page to be accessed under the condition that the entity word data of the page to be accessed cannot be retrieved from the database.
And the third data extraction module is used for extracting the characteristic data based on the page text of the page to be accessed to obtain the entity word data of the page to be accessed.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
According to an embodiment of the present disclosure, an electronic device includes at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method as described above.
According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The various components in device 700 are connected to input/output (I/O) interfaces 705, including an input unit 706, such as a keyboard, mouse, etc., an output unit 707, such as various types of displays, speakers, etc., a storage unit 708, such as a magnetic disk, optical disk, etc., and a communication unit 709, such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, for example, a page processing method. For example, in some embodiments, the page processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When a computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the page processing method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the page processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user, for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), and the Internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (23)

1. A method of page processing, comprising:
responding to a page access request, and sending address information of a page to be accessed to a server to acquire entity word data of the page to be accessed returned by the server;
positioning and obtaining at least one entity word from the page to be accessed based on the entity word data, and
And carrying out clickable processing on the entity words to obtain a target page.
2. The method of claim 1, wherein the entity word data includes at least one word element, and location information of each of the at least one word element, the at least one word element corresponding to each of the at least one entity word;
wherein, based on the entity word data, locating and obtaining at least one entity word from the page to be accessed includes:
Determining a target text portion from the page text of the page to be accessed based on the positioning information of the word element, and
And positioning and obtaining the entity word corresponding to the word element from the target text part based on the word element.
3. The method of claim 2, wherein the location information includes contextual information of the word element;
The determining a target text part from the page text of the page to be accessed based on the positioning information of the word elements comprises the following steps:
and performing text matching in the page text of the page to be accessed based on the context information of the word element to obtain a target sentence corresponding to the context information of the word element, and obtaining the target text part.
4. The method of claim 2, wherein the positioning information includes a page display style of the server and an initial positioning feature of the word element;
The determining a target text part from the page text of the page to be accessed based on the positioning information of the word elements comprises the following steps:
based on the page display style of the server and the page display style of the client, adjusting the initial positioning characteristics of the word elements to obtain target positioning characteristics, and
And based on the target positioning characteristics, matching the target text part from the page text of the page to be accessed.
5. The method according to any one of claims 2-4, wherein the locating, based on the word element, the entity word corresponding to the word element from the target text portion includes:
keyword matching is carried out on the target text part by taking the word elements as keywords to obtain a matching result, and
And positioning and obtaining the entity word corresponding to the word element from the target text part based on the matching result.
6. The method of claim 2, wherein the at least one word element is configured to be ordered based on text content of the page to be accessed;
The determining a target text part from the page text of the page to be accessed based on the positioning information of the word elements comprises the following steps:
For the ith word element, deleting the page text based on the position of the entity word corresponding to the ith-1 word element in the page text to obtain a deleted page text, wherein i is an integer greater than 1, and
And determining a target text part from the deleted page text based on the positioning information of the ith word element.
7. The method of claim 1, wherein the clickable processing the entity word to obtain a target page includes:
acquiring address information of a search page aiming at the entity word;
Generating a page tag based on the address information, and
And modifying the entity word by using the page tag so as to perform clickable processing on the entity word to obtain the target page.
8. The method of claim 1, further comprising:
and responding to the click operation of the target entity word in the target page, and displaying the search page taking the target entity word as a search keyword.
9. A method of page processing, comprising:
Determining entity word data of a page to be accessed based on the address information of the page to be accessed in response to receiving the address information of the page to be accessed from a client, and
And sending the entity word data of the page to be accessed to the client so that the client locates at least one entity word from the page to be accessed based on the entity word data, and performs clickable processing on the entity word to obtain a target page.
10. The method of claim 9, further comprising:
Acquiring a plurality of pages;
Extracting feature data based on the page text of the page to obtain entity word data of the page, and
And storing the entity word data of the page in a database by taking the address information of the page as an index.
11. The method of claim 10, wherein the extracting feature data based on the page text of the page to obtain the entity word data of the page comprises:
performing word segmentation processing on the page text of the page to obtain a plurality of word segmentation;
Screening at least one word element from the plurality of word-cutting words based on the part of speech of the word-cutting word;
determining respective positioning information of said at least one word element from said page text, and
And obtaining entity word data of the page based on the at least one word element and the respective positioning information of the at least one word element.
12. The method of claim 11, wherein the location information comprises contextual information of the word element;
wherein the determining the respective positioning information of the at least one word element from the page text includes:
Determining a target sentence in which the word element is located from the page text, and
Based on the target sentence, context information of the word element is obtained.
13. The method of claim 10, further comprising:
Performing type recognition on the page to obtain recognition result, and
And extracting feature data based on the page text of the page to obtain the entity word data of the page under the condition that the identification result indicates that the type of the page is a preset type.
14. The method according to any one of claims 9-13, wherein the determining, based on the address information of the page to be accessed, the entity word data of the page to be accessed includes:
And taking the address information of the page to be accessed as an index, and performing data retrieval in a database to obtain the entity word data of the page to be accessed.
15. The method according to any one of claims 9-13, wherein the performing data retrieval in a database with the address information of the page to be accessed as an index to obtain entity word data of the page to be accessed includes:
taking the address information of the page to be accessed as an index, performing data retrieval in a database to obtain initial entity word data of the page to be accessed, and
And carrying out data processing on the initial entity word data to obtain the entity word data of the page to be accessed.
16. The method of claim 15, wherein the performing data processing on the initial entity word data of the page to be accessed to obtain the entity word data of the page to be accessed includes:
acquiring respective click rate scores of at least one initial word element included in the initial entity word data, wherein the click rate scores of the initial word elements are obtained by carrying out click rate estimation on the initial word elements;
Screening at least one target word element from at least one initial word element based on respective click rate scores of the at least one initial word element, and
And obtaining the entity word data of the page to be accessed based on the at least one target word element and the respective positioning information of the at least one target word element.
17. The method of claim 15, wherein the performing data processing on the initial entity word data of the page to be accessed to obtain the entity word data of the page to be accessed includes:
Ordering at least one initial word element included in the initial entity word data based on the text content of the page, and
And obtaining the entity word data of the page based on the ordered at least one word element and the respective positioning information of the ordered at least one word element.
18. The method of any one of claims 14-17, further comprising:
Acquiring page text of the page to be accessed based on the address information of the page to be accessed under the condition that the entity word data of the page to be accessed cannot be retrieved from the database, and
And extracting feature data based on the page text of the page to be accessed to obtain entity word data of the page to be accessed.
19. A page processing apparatus comprising:
The information sending module is used for responding to a page access request and sending address information of a page to be accessed to a server side so as to acquire entity word data of the page to be accessed, which is returned by the server side;
the entity word positioning module is used for positioning and obtaining at least one entity word from the page to be accessed based on the entity word data and
And the entity word processing module is used for carrying out clickable processing on the entity words to obtain a target page.
20. A page processing apparatus comprising:
The entity word determining module is used for determining entity word data of the page to be accessed based on the address information of the page to be accessed in response to receiving the address information of the page to be accessed from the client side, and
And the entity word sending module is used for sending entity word data of the page to be accessed to the client so that the client can locate at least one entity word from the page to be accessed based on the entity word data and can click the entity word to obtain a target page.
21. An electronic device, comprising:
At least one processor, and
A memory communicatively coupled to the at least one processor, wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-18.
22. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-18.
23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-18.
CN202510571820.4A 2025-04-30 2025-04-30 Page processing method, page processing device, electronic device, storage medium and program product Pending CN120372117A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510571820.4A CN120372117A (en) 2025-04-30 2025-04-30 Page processing method, page processing device, electronic device, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510571820.4A CN120372117A (en) 2025-04-30 2025-04-30 Page processing method, page processing device, electronic device, storage medium and program product

Publications (1)

Publication Number Publication Date
CN120372117A true CN120372117A (en) 2025-07-25

Family

ID=96444714

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510571820.4A Pending CN120372117A (en) 2025-04-30 2025-04-30 Page processing method, page processing device, electronic device, storage medium and program product

Country Status (1)

Country Link
CN (1) CN120372117A (en)

Similar Documents

Publication Publication Date Title
US11521603B2 (en) Automatically generating conference minutes
CN109325201B (en) Method, device, equipment and storage medium for generating entity relationship data
CN113822067A (en) Key information extraction method and device, computer equipment and storage medium
US7672932B2 (en) Speculative search result based on a not-yet-submitted search query
CN109190049B (en) Keyword recommendation method, system, electronic device and computer readable medium
CN109634436B (en) Method, device, equipment and readable storage medium for associating input method
TW202020691A (en) Feature word determination method and device and server
CN104899322A (en) Search engine and implementation method thereof
JP2015525929A (en) Weight-based stemming to improve search quality
CN106960030A (en) Pushed information method and device based on artificial intelligence
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
CN111160007B (en) Search method and device based on BERT language model, computer equipment and storage medium
US10311114B2 (en) Displaying stylized text snippets with search engine results
CN106708885A (en) Method and device for achieving searching
CN113806660A (en) Data evaluation method, training method, apparatus, electronic device, and storage medium
CN112926297A (en) Method, apparatus, device and storage medium for processing information
CN116150497A (en) Text information recommendation method, device, electronic device and storage medium
CN118981527A (en) Question answering method, device, electronic device, storage medium, intelligent agent and program product based on large model
CN112579729A (en) Training method and device for document quality evaluation model, electronic equipment and medium
CN113505889B (en) Processing method and device of mapping knowledge base, computer equipment and storage medium
CN119692478A (en) Information generation method, information interaction method, device, electronic device and medium
CN111666417A (en) Method and device for generating synonyms, electronic equipment and readable storage medium
CN114048315A (en) Method, apparatus, electronic device and storage medium for determining a document label
CN114168837A (en) Chatbot searching method, equipment and storage medium
US20170293683A1 (en) Method and system for providing contextual information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination