CN112016017A

CN112016017A - Method and device for determining characteristic data

Info

Publication number: CN112016017A
Application number: CN201910470654.3A
Authority: CN
Inventors: 龚小冬
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2020-12-01

Abstract

The invention discloses a method and a device for determining characteristic data, and relates to the technical field of computers. One embodiment of the method comprises: acquiring label information of a current page and portrait information of a target user accessing the current page; determining a related user list corresponding to a target user according to the portrait information of the target user, and acquiring an access page list corresponding to the related user list; selecting a target page from the access page list according to the tag information of the current page, and acquiring a characteristic data pool corresponding to the target page; and determining feature data matched with the target user from the feature data pool by combining the portrait information of the target user. According to the embodiment, the target page corresponding to the user and the matched feature data are determined through the label information and the user image information of the current page, so that the feature data can be quickly acquired when the user accesses the target page, and the effects of improving the response performance of the page and reducing the response time are achieved.

Description

Method and apparatus for determining characteristic data

技术领域technical field

本发明涉及计算机技术领域，尤其涉及一种确定特征数据的方法和装置。The present invention relates to the field of computer technology, and in particular, to a method and apparatus for determining characteristic data.

背景技术Background technique

近年来，随着计算机技术和互联网的快速发展，用户对于访问网页的加载性能有更高的要求，如果页面加载不满足用户的需求，用户可能会直接关闭该页面，导致页面的访问量降低。为了满足用户需求，当前页面包含大量与用户需求相关的特征信息，导致整个页面的内容非常大，影响页面的加载性能。因此，如何针对不同用户进行特征数据确定，具有重要的研究意义。In recent years, with the rapid development of computer technology and the Internet, users have higher requirements for the loading performance of visiting web pages. If the page loading does not meet the user's needs, the user may directly close the page, resulting in a decrease in page visits. In order to meet user needs, the current page contains a lot of feature information related to user needs, resulting in very large content of the entire page, which affects the loading performance of the page. Therefore, how to determine the characteristic data for different users has important research significance.

当前用户访问具有特征数据的页面时，采用实时调用特征数据接口，获取对应的特征数据，或者是基于用户页面关系预先缓存全部页面和全部用户的特征数据。在实现本发明过程中，发明人发现现有技术中至少存在如下问题：一、现有特征数据响应时长依赖于调用的特征数据接口，同时整个页面的特征数据时长正比于页面中特征数据数量，因此会增加整个页面的响应时间，进而影响用户体验；二、全部缓存需要的存储空间很大，并且基于用户页面关系进行缓存，命中率很低。When the current user accesses a page with feature data, the feature data interface is called in real time to obtain the corresponding feature data, or the feature data of all pages and all users is pre-cached based on the user-page relationship. In the process of implementing the present invention, the inventor found that there are at least the following problems in the prior art: 1. The response duration of the existing feature data depends on the feature data interface called, and the feature data duration of the entire page is proportional to the number of feature data in the page, Therefore, the response time of the entire page will be increased, and the user experience will be affected. Second, the storage space required for all caches is large, and the cache is based on the relationship between user pages, and the hit rate is very low.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明实施例提供一种确定特征数据的方法和装置，能够在用户访问到目标页面时，快速获取特征数据，达到提升页面的响应性能和降低响应时长的效果。In view of this, embodiments of the present invention provide a method and apparatus for determining feature data, which can quickly acquire feature data when a user accesses a target page, thereby improving page response performance and reducing response time.

为实现上述目的，根据本发明实施例的第一方面，提供了一种确定特征数据的方法。To achieve the above object, according to a first aspect of the embodiments of the present invention, a method for determining characteristic data is provided.

本发明实施例的一种确定特征数据的方法，包括：获取当前页面的标签信息和访问所述当前页面的目标用户的画像信息；根据所述目标用户的画像信息，确定所述目标用户对应的相关用户列表，并获取所述相关用户列表对应的访问页面列表；根据所述当前页面的标签信息，从所述访问页面列表中选择目标页面，并获取所述目标页面对应的特征数据池；结合所述目标用户的画像信息，从所述特征数据池中确定与所述目标用户匹配的特征数据。A method for determining feature data according to an embodiment of the present invention includes: acquiring label information of a current page and portrait information of a target user accessing the current page; related user list, and obtain the access page list corresponding to the related user list; according to the label information of the current page, select a target page from the access page list, and obtain the feature data pool corresponding to the target page; For the portrait information of the target user, characteristic data matching the target user is determined from the characteristic data pool.

可选地，根据所述目标用户的画像信息，确定所述目标用户对应的相关用户列表，包括：从用户页面访问数据库中获取用户集合；针对所述用户集合中每个用户，获取所述每个用户的画像信息，并根据所述目标用户的画像信息和所述每个用户的画像信息，计算所述目标用户与所述每个用户之间的用户相关度；根据所述目标用户与所述每个用户之间的用户相关度，并结合预设的用户相关度阈值，从所述用户集合中选择所述目标用户对应的相关用户列表。Optionally, determining the relevant user list corresponding to the target user according to the profile information of the target user includes: obtaining a user set from a user page access database; for each user in the user set, obtaining the profile information of each user, and calculate the user correlation between the target user and each user according to the profile information of the target user and the profile information of each user; The user relevancy between each user is described, and a related user list corresponding to the target user is selected from the user set in combination with a preset user relevancy threshold.

可选地，根据所述当前页面的标签信息，从所述访问页面列表中选择目标页面，包括：针对所述访问页面列表中每个访问页面，获取所述每个访问页面的标签信息；根据所述当前页面的标签信息和所述每个访问页面的标签信息，计算所述当前页面与所述每个访问页面之间的页面相关度；根据所述当前页面与所述每个访问页面之间的页面相关度，并结合预设的页面相关度阈值，从所述访问页面列表中选择所述目标页面。Optionally, selecting a target page from the access page list according to the tag information of the current page includes: for each access page in the access page list, acquiring the tag information of each access page; according to The label information of the current page and the label information of each visited page are calculated, and the page correlation degree between the current page and each visited page is calculated; The target page is selected from the visited page list in combination with a preset page relevancy threshold.

可选地，根据所述当前页面的标签信息和所述每个访问页面的标签信息，计算所述当前页面与所述每个访问页面之间的页面相关度，包括：确定所述当前页面的标签信息中的类目信息和标识信息，以及所述每个访问页面的标签信息中的类目信息和标识信息；根据所述当前页面的类目信息和所述每个访问页面的类目信息，计算所述当前页面与所述每个访问页面之间的第一页面相关度；根据所述当前页面的标识信息和所述每个访问页面的标识信息，计算所述当前页面与所述每个访问页面之间的第二页面相关度；根据所述第一页面相关度和所述第二页面相关度，计算所述当前页面与所述每个访问页面之间的页面相关度。Optionally, according to the label information of the current page and the label information of each visited page, calculating the page relevance between the current page and each visited page includes: determining the page correlation of the current page. category information and identification information in the tag information, and category information and identification information in the tag information of each visited page; according to the category information of the current page and the category information of each visited page , calculate the first page correlation between the current page and each visited page; according to the identification information of the current page and the identification information of each visited page, calculate the current page and the each visited page. The second page relevance between each visited page; the page relevance between the current page and each of the visited pages is calculated according to the first page relevance and the second page relevance.

可选地，在从所述特征数据池中确定与所述目标用户匹配的特征数据之后，所述方法还包括：缓存所述特征数据；接收所述目标用户访问所述目标页面的请求，直接加载缓存的所述特征数据。Optionally, after determining the feature data matching the target user from the feature data pool, the method further includes: caching the feature data; receiving a request from the target user to access the target page, directly Load the cached feature data.

为实现上述目的，根据本发明实施例的第二方面，提供了一种确定特征数据的装置。To achieve the above object, according to a second aspect of the embodiments of the present invention, an apparatus for determining feature data is provided.

本发明实施例的一种确定特征数据的装置，包括：获取模块，用于获取当前页面的标签信息和访问所述当前页面的目标用户的画像信息；第一确定模块，用于根据所述目标用户的画像信息，确定所述目标用户对应的相关用户列表，并获取所述相关用户列表对应的访问页面列表；选择模块，用于根据所述当前页面的标签信息，从所述访问页面列表中选择目标页面，并获取所述目标页面对应的特征数据池；第二确定模块，用于结合所述目标用户的画像信息，从所述特征数据池中确定与所述目标用户匹配的特征数据。An apparatus for determining feature data according to an embodiment of the present invention includes: an acquisition module, configured to acquire label information of a current page and portrait information of a target user accessing the current page; a first determination module, configured to User's portrait information, determine the relevant user list corresponding to the target user, and obtain the access page list corresponding to the relevant user list; the selection module is used to select from the access page list according to the label information of the current page. A target page is selected, and a feature data pool corresponding to the target page is obtained; a second determination module is configured to determine feature data matching the target user from the feature data pool in combination with the profile information of the target user.

可选地，所述第一确定模块还用于：从用户页面访问数据库中获取用户集合；针对所述用户集合中每个用户，获取所述每个用户的画像信息，并根据所述目标用户的画像信息和所述每个用户的画像信息，计算所述目标用户与所述每个用户之间的用户相关度；根据所述目标用户与所述每个用户之间的用户相关度，并结合预设的用户相关度阈值，从所述用户集合中选择所述目标用户对应的相关用户列表。Optionally, the first determining module is further configured to: obtain a user set from a user page access database; for each user in the user set, obtain the portrait information of each user, and analyze the information according to the target user. The profile information and the profile information of each user, calculate the user correlation between the target user and each user; according to the user correlation between the target user and each user, and In combination with a preset user relevancy threshold, a related user list corresponding to the target user is selected from the user set.

可选地，所述选择模块还用于：针对所述访问页面列表中每个访问页面，获取所述每个访问页面的标签信息；根据所述当前页面的标签信息和所述每个访问页面的标签信息，计算所述当前页面与所述每个访问页面之间的页面相关度；根据所述当前页面与所述每个访问页面之间的页面相关度，并结合预设的页面相关度阈值，从所述访问页面列表中选择所述目标页面。Optionally, the selection module is further configured to: for each access page in the access page list, obtain the tag information of each access page; according to the tag information of the current page and the each access page the tag information, calculate the page correlation between the current page and each visited page; according to the page correlation between the current page and each visited page, combined with the preset page correlation Threshold, select the target page from the visited page list.

可选地，所述选择模块还用于：确定所述当前页面的标签信息中的类目信息和标识信息，以及所述每个访问页面的标签信息中的类目信息和标识信息；根据所述当前页面的类目信息和所述每个访问页面的类目信息，计算所述当前页面与所述每个访问页面之间的第一页面相关度；根据所述当前页面的标识信息和所述每个访问页面的标识信息，计算所述当前页面与所述每个访问页面之间的第二页面相关度；根据所述第一页面相关度和所述第二页面相关度，计算所述当前页面与所述每个访问页面之间的页面相关度。Optionally, the selection module is further configured to: determine the category information and identification information in the tag information of the current page, and the category information and identification information in the tag information of each visited page; The category information of the current page and the category information of each visited page are calculated, and the first page correlation degree between the current page and each visited page is calculated; the identification information of each visited page, and calculate the second page relevance between the current page and each visited page; according to the first page relevance and the second page relevance, calculate the Page relevance between the current page and each of the visited pages.

可选地，所述装置还包括缓存模块，用于：缓存所述特征数据；接收所述目标用户访问所述目标页面的请求，直接加载缓存的所述特征数据。Optionally, the apparatus further includes a cache module, configured to: cache the feature data; receive a request from the target user to access the target page, and directly load the cached feature data.

为实现上述目的，根据本发明实施例的第三方面，提供了一种电子设备。To achieve the above object, according to a third aspect of the embodiments of the present invention, an electronic device is provided.

本发明实施例的一种电子设备包括：一个或多个处理器；存储装置，用于存储一个或多个程序，当一个或多个程序被一个或多个处理器执行，使得一个或多个处理器实现本发明实施例的确定特征数据的方法。An electronic device according to an embodiment of the present invention includes: one or more processors; and a storage device for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more programs The processor implements the method for determining characteristic data according to the embodiment of the present invention.

为实现上述目的，根据本发明实施例的第四方面，提供了一种计算机可读介质。To achieve the above object, according to a fourth aspect of the embodiments of the present invention, a computer-readable medium is provided.

本发明实施例的一种计算机可读介质，其上存储有计算机程序，程序被处理器执行时实现本发明实施例的确定特征数据的方法。A computer-readable medium of an embodiment of the present invention stores a computer program thereon, and when the program is executed by a processor, the method for determining characteristic data of the embodiment of the present invention is implemented.

上述发明中的一个实施例具有如下优点或有益效果：能够通过当前页面的标签信息和访问当前页面的目标用户的画像信息，确定该目标用户对应的目标页面，并获取目标页面的特征数据池，最后结合目标用户的画像信息从特征数据池中选择与目标用户匹配的特征数据，从而可以在目标用户访问到目标页面时，快速获取特征数据，替代了现有技术中实时调用特征数据接口，降低了特征数据的处理时间，达到提升页面的响应性能和降低响应时长的效果。此外，通过引入目标用户画像信息和当前页面的标签信息，基于目标用户的当前访问页面确定对应的目标页面，并预先获取目标页面的特征数据，相对于对亿级页面的全部特征数据进行缓存，大大节约了存储空间，降低成本。还有，通过引入用户画像信息和页面标签信息，在整个处理过程中提高确定目标页面的准确率，提高特征数据的命中率，进一步提高页面的响应性能。An embodiment of the above invention has the following advantages or beneficial effects: the target page corresponding to the target user can be determined through the tag information of the current page and the portrait information of the target user accessing the current page, and the feature data pool of the target page can be obtained, Finally, the feature data that matches the target user is selected from the feature data pool in combination with the target user's portrait information, so that the feature data can be quickly obtained when the target user accesses the target page, which replaces the real-time call feature data interface in the prior art and reduces the It reduces the processing time of feature data, and achieves the effect of improving the response performance of the page and reducing the response time. In addition, by introducing the target user portrait information and the label information of the current page, the corresponding target page is determined based on the current page accessed by the target user, and the feature data of the target page is acquired in advance, compared to caching all the feature data of the billion-level page, Greatly saves storage space and reduces costs. In addition, by introducing user portrait information and page label information, the accuracy of determining the target page is improved in the entire processing process, the hit rate of the feature data is improved, and the response performance of the page is further improved.

上述的非惯用的可选方式所具有的进一步效果将在下文中结合具体实施方式加以说明。Further effects of the above non-conventional alternatives will be described below in conjunction with specific embodiments.

附图说明Description of drawings

附图用于更好地理解本发明，不构成对本发明的不当限定。其中：The accompanying drawings are used for better understanding of the present invention and do not constitute an improper limitation of the present invention. in:

图1是根据本发明实施例的确定特征数据的方法的主要步骤的示意图；1 is a schematic diagram of main steps of a method for determining feature data according to an embodiment of the present invention;

图2是根据本发明实施例的页面的类目信息关系示意图；2 is a schematic diagram of a category information relationship of a page according to an embodiment of the present invention;

图3是根据本发明实施例的确定特征数据的方法的主要步骤的示意图；3 is a schematic diagram of main steps of a method for determining feature data according to an embodiment of the present invention;

图4是根据本发明实施例的确定特征数据的装置的主要模块的示意图；4 is a schematic diagram of main modules of an apparatus for determining feature data according to an embodiment of the present invention;

图5是本发明实施例可以应用于其中的示例性系统架构图；5 is an exemplary system architecture diagram to which an embodiment of the present invention may be applied;

图6是适于用来实现本发明实施例的终端设备或服务器的计算机系统的结构示意图。FIG. 6 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的示范性实施例做出说明，其中包括本发明实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本发明的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

针对含有特征数据的页面，该页面还具有静态数据，静态数据主要可以包括页面上的层叠样式表、脚本语言和实时性要求不高的静态网页片段内容。由于静态数据针对所有用户展示的内容是一致且在一定时间内不会改变，因此可以通过缓存技术预先缓存静态数据，这样能够减少服务器对页面渲染处理、减少请求的回源处理和降低服务器处理压力，提高请求的响应速度和性能。For a page containing feature data, the page also has static data, and the static data may mainly include cascading style sheets on the page, scripting languages, and content of static web page fragments with low real-time requirements. Since the content displayed by static data is consistent for all users and will not change within a certain period of time, static data can be cached in advance through caching technology, which can reduce the server's page rendering processing, reduce request back-to-source processing and reduce server processing pressure , to improve the response speed and performance of requests.

特征数据是基于访问用户的画像信息确定的推荐数据，并且在不同页面、不同时间段的推荐数据存在一定差异。现有技术的确定特征数据的方法具有依赖于调用的特征数据接口，增加整个页面的响应时间的问题，而全部缓存需要的存储空间很大，并且基于用户页面关系进行缓存，命中率很低。因此，针对具有特征数据的页面加载性能问题，本发明提出了一种引入访问当前页面的目标用户的画像信息和当前页面的标签信息，基于目标用户访问的当前页面确定未来访问的目标页面，提前获取目标页面中的特征数据。当目标用户访问到该目标页面时，可以快速的获取特征数据以完成页面展示，进而提高页面加载性能。The feature data is the recommendation data determined based on the profile information of the visiting user, and there are certain differences in the recommendation data on different pages and different time periods. The method for determining feature data in the prior art has the problem of increasing the response time of the entire page depending on the feature data interface called, and the storage space required for all caches is large, and the cache is based on the relationship between user pages, and the hit rate is very low. Therefore, in view of the problem of page loading performance with characteristic data, the present invention proposes a method to introduce the portrait information of the target user accessing the current page and the label information of the current page, and determine the target page to visit in the future based on the current page accessed by the target user, and advance Get the feature data in the target page. When the target user accesses the target page, the feature data can be quickly obtained to complete the page display, thereby improving the page loading performance.

图1是根据本发明实施例的确定特征数据的方法的主要步骤的示意图。作为本发明的一个可参考实施例，如图1所示，本发明实施例的确定特征数据的方法的主要步骤可以包括步骤S101至S104。FIG. 1 is a schematic diagram of main steps of a method for determining feature data according to an embodiment of the present invention. As a referenced embodiment of the present invention, as shown in FIG. 1 , the main steps of the method for determining feature data in the embodiment of the present invention may include steps S101 to S104.

步骤S101：获取当前页面的标签信息和访问当前页面的目标用户的画像信息。Step S101: Obtain the label information of the current page and the portrait information of the target user accessing the current page.

本发明实施例的当前页面是指用户正在访问的页面，当前页面的标签信息可以包括页面的类目信息和页面的标识信息。本发明实施例的目标用户是指正在访问当前页面的用户，目标用户的画像信息可以包括用户的唯一标识、用户的基本信息、用户的浏览信息、用户的操作行为信息等。考虑到本发明的目的是确定特征数据，也就是确定目标用户接下来感兴趣的数据，因此需要结合目标用户的画像信息以及当前页面的标签信息，对目标用户的下一步访问页面进行确定，进而可以确定下一步访问页面的特征数据，也就是用户感兴趣的数据。The current page in this embodiment of the present invention refers to the page that the user is visiting, and the label information of the current page may include category information of the page and identification information of the page. The target user in this embodiment of the present invention refers to a user who is accessing the current page, and the portrait information of the target user may include a unique identifier of the user, basic information of the user, browsing information of the user, information of the user's operation behavior, and the like. Considering that the purpose of the present invention is to determine the characteristic data, that is, to determine the data that the target user is interested in next, it is necessary to combine the portrait information of the target user and the label information of the current page to determine the next page of the target user to visit, and then The characteristic data of the next page to be accessed, that is, the data that the user is interested in, can be determined.

步骤S102：根据目标用户的画像信息，确定目标用户对应的相关用户列表，并获取相关用户列表对应的访问页面列表。Step S102: Determine a related user list corresponding to the target user according to the profile information of the target user, and obtain a list of access pages corresponding to the related user list.

本发明实施例中，在确定特征数据之前，需要确定目标用户接下来可能访问的多个目标页面。在本发明实施例中，利用与目标用户相关度较高的用户的页面访问列表获取的，因此，首先需要计算目标用户对应的相关用户列表，也就是计算与目标用户相关度较高的用户，这些与目标用户相关度较高的用户组成相关用户列表。In this embodiment of the present invention, before determining the feature data, it is necessary to determine multiple target pages that the target user may visit next. In the embodiment of the present invention, it is obtained by using the page access list of the user with a high degree of relevance to the target user. Therefore, it is necessary to first calculate the relevant user list corresponding to the target user, that is, to calculate the user with a high degree of relevance to the target user. These users with high relevance to the target user form a related user list.

作为本发明的再一个可参数实施例，根据目标用户的画像信息确定目标用户对应的相关用户列表可以包括步骤S1021、步骤S1022和步骤S1023。As another parameterizable embodiment of the present invention, determining the related user list corresponding to the target user according to the portrait information of the target user may include steps S1021, S1022 and S1023.

步骤S1021：从用户页面访问数据库中获取用户集合，其中用户页面访问数据库中存储用户与页面的访问关系数据，可以从该数据库中筛选出用户集合；Step S1021: obtaining the user set from the user page access database, wherein the user page access database stores the access relationship data between the user and the page, and the user set can be filtered out from the database;

步骤S1022：针对用户集合中每个用户，获取每个用户的画像信息，并根据目标用户的画像信息和每个用户的画像信息，计算目标用户与每个用户之间的用户相关度，此步骤是利用画像信息计算用户集合中每个用户与目标用户的用户相关度。Step S1022: For each user in the user set, obtain the profile information of each user, and calculate the user correlation between the target user and each user according to the profile information of the target user and the profile information of each user. It uses the portrait information to calculate the user correlation between each user in the user set and the target user.

目前网站已经采集了用户的画像信息，在步骤S101中介绍画像信息可以包括用户的唯一标识、用户的基本信息、用户的浏览信息、用户的操作行为信息等。因此通过这些画像信息可以计算出用户之间的相关度，具体计算方式如下。At present, the website has collected the user's portrait information. In step S101, the introduction portrait information may include the user's unique identifier, the user's basic information, the user's browsing information, and the user's operation behavior information. Therefore, the correlation between users can be calculated through these portrait information, and the specific calculation method is as follows.

针对用户u_i可以通过画像信息进行定义，如下：User u _i can be defined through portrait information, as follows:

u_i＝[x_i1 ... x_il ... x_iN]^T,(1≤l≤N)u _i =[x _i1 ... x _il ... x _iN ] ^T , (1≤l≤N)

其中，x_il为该用户的画像信息，通过N个不同的画像信息对用户进行描述。用户之间的相关度R_user(u_i,u_j)可以表示为：Among them, x _il is the portrait information of the user, and the user is described through N different portrait information. The correlation between users R _user (u _i , u _j ) can be expressed as:

其中，R_user(u_i,u_j)的计算结果值越大，则说明用户u_i和u_j的相关度就越高。Among them, the larger the value of the calculation result of R _user (u _i , u _j ) is, the higher the correlation between users u _i and u _j is.

步骤S1023：根据目标用户与每个用户之间的用户相关度，并结合预设的用户相关度阈值，从用户集合中选择目标用户对应的相关用户列表。Step S1023: According to the user relevancy between the target user and each user, and in combination with a preset user relevancy threshold, select a related user list corresponding to the target user from the user set.

此处的用户相关度阈值可以是某个数值，比如说若目标用户与一个用户A的相关度大于这个数值，则说明用户A是目标用户的相关用户，可以加入相关用户列表中。或者是用户相关度阈值是个数量，可以对用户集合中每个用户与目标用户的相关度进行排序，选择相关度较高的M个相关度对应的用户，这些较高的相关度对应的用户组成目标用户的相关用户列表。比如说，对R_user(u_i,u_j)的值进行降序排序，获取前M个相关度较高的用户构造成目标用户的相关用户列表U_t。The user relevancy threshold here can be a certain value. For example, if the relevancy between the target user and a user A is greater than this value, it means that user A is a related user of the target user and can be added to the related user list. Or the user relevancy threshold is a number, the relevancy of each user in the user set and the target user can be sorted, and the users corresponding to M relevancy degrees with higher relevancy can be selected, and the users corresponding to these higher relevancy degrees are composed of A list of related users for the target user. For example, the values of R _user (u _i , u _j ) are sorted in descending order, and the top M users with high relevancy are obtained to construct a related user list U _t of the target user.

从步骤S1021至S1023中可以看出，本发明实施例在确定目标用户的相关用户列表时，考虑到目标用户的画像信息，并且是将与目标用户的画像信息相似的用户构造成目标用户的相关用户列表，这样能够提高对目标页面的准确率。在获取到目标用户的相关用户列表之后，可以直接获取相关用户列表对应的访问页面列表，也就是从用户列表数据库中，直接提取目标用户列表中的用户对应的访问页面列表。此处需要注意的是，在提取目标用户列表中的用户对应的访问页面列表时，可以考虑到时间信息，比如此时是下午1点钟，那么可以提取近期(此处的近期可以根据实际情况设置，比如一周内)在下午1点，目标用户列表中的用户的访问页面列表。当然，在提取目标用户列表中的用户对应的访问页面列表时，也可以考虑到其他信息，此处不作限定。It can be seen from steps S1021 to S1023 that in this embodiment of the present invention, when determining the relevant user list of the target user, the portrait information of the target user is considered, and users similar to the portrait information of the target user are constructed as the relevant users of the target user. User list, which can improve the accuracy of the target page. After the related user list of the target user is obtained, the access page list corresponding to the related user list can be directly obtained, that is, the access page list corresponding to the user in the target user list is directly extracted from the user list database. It should be noted here that when extracting the access page list corresponding to the user in the target user list, the time information can be considered. Set, say, within a week) at 1:00 PM, a list of pages visited by users in the target user list. Of course, other information may also be considered when extracting the access page list corresponding to the user in the target user list, which is not limited here.

步骤S103：根据当前页面的标签信息，从访问页面列表中选择目标页面，并获取目标页面对应的特征数据池。Step S103: According to the tag information of the current page, select a target page from the list of access pages, and acquire a feature data pool corresponding to the target page.

在步骤S102获取到访问页面列表后，可以结合当前页面的标签信息，从该访问页面列表中选择目标页面。作为本发明实施例的又一个可参考实施例，根据当前页面的标签信息从访问页面列表中选择目标页面，可以包括步骤S1031至步骤S1033。After the access page list is acquired in step S102, a target page may be selected from the access page list in combination with the tag information of the current page. As another referenced embodiment of the embodiment of the present invention, selecting a target page from a list of access pages according to the tag information of the current page may include steps S1031 to S1033.

步骤S1031：针对访问页面列表中每个访问页面，获取每个访问页面的标签信息，在上文步骤S101中提到页面的标签信息可以包括页面的类目信息和页面的标识信息，其中类目信息是指这个页面的类别信息，图2是根据本发明实施例的页面的类目信息关系示意图。从图2可以看出，图中依次对应页面的一级页面类目、二级页面类目、三级页面类目等类目信息，最低层(也就是叶子层)标识的是页面，对于图2中的页面p1，它的一级页面类目是类目root(即根节点)，它的二级页面类目是类目1，它的三级页面类目是11。针对某图书管理系统的类目级别，一级类目可以是中国图书；二级类目可以是哲学、社会科学、自然科学和综合性图书；三级类目可以是二级类目哲学下包括哲学基本问题、唯物主义和唯心主义等。页面的标识信息可以包括页面的唯一标识、页面的属性信息等基本信息。由于目标用户访问的当前页面和目标用户接下来访问的页面有一定关系，因此本发明实施例中在确定特征数据时，考虑到当前页面的标签信息，可以提高准确度。Step S1031: For each access page in the access page list, obtain the tag information of each access page, the tag information of the page mentioned in the above step S101 may include the category information of the page and the identification information of the page, wherein the category The information refers to the category information of the page, and FIG. 2 is a schematic diagram of the category information relationship of the page according to an embodiment of the present invention. As can be seen from Figure 2, the figure corresponds to the first-level page category, second-level page category, third-level page category and other category information of the page in turn. The lowest layer (ie, the leaf layer) identifies the page. For the page p1 in 2, its first-level page category is category root (ie, the root node), its second-level page category is category 1, and its third-level page category is 11. For the category level of a certain library management system, the first-level category can be Chinese books; the second-level category can be philosophy, social science, natural science and comprehensive books; the third-level category can be the second-level category Philosophy includes Fundamental issues of philosophy, materialism and idealism, etc. The identification information of the page may include basic information such as the unique identification of the page and the attribute information of the page. Since the current page accessed by the target user has a certain relationship with the page accessed by the target user next, in the embodiment of the present invention, when determining the feature data, the label information of the current page is considered, which can improve the accuracy.

步骤S1032：根据当前页面的标签信息和每个访问页面的标签信息，计算当前页面与每个访问页面之间的页面相关度。在上文步骤S101和步骤S1031中均提到页面的标签信息可以包括页面的类目信息和页面的标识信息。因此，本发明实施例的当前页面与每个访问页面之间的页面相关度就是利用页面的类目信息和页面的标识信息计算得到的。本发明实施例中计算当前页面与每个访问页面之间的页面相关度，可以包括：Step S1032: Calculate the page correlation between the current page and each visited page according to the label information of the current page and the label information of each visited page. The tag information of the page mentioned in both steps S101 and S1031 above may include category information of the page and identification information of the page. Therefore, the page correlation between the current page and each visited page in the embodiment of the present invention is calculated by using the category information of the page and the identification information of the page. In the embodiment of the present invention, calculating the page relevancy between the current page and each visited page may include:

步骤S10321：确定当前页面的标签信息中的类目信息和标识信息，以及每个访问页面的标签信息中的类目信息和标识信息，其中类目信息和标识信息在上文中已经具体解释，此处不在累述。Step S10321: Determine the category information and identification information in the tag information of the current page, and the category information and identification information in the tag information of each accessed page, wherein the category information and identification information have been specifically explained above, this Tired everywhere.

步骤S10322：根据当前页面的类目信息和每个访问页面的类目信息，计算当前页面与每个访问页面之间的第一页面相关度，其中第一页面相关度的计算公式如下：Step S10322: Calculate the first page correlation between the current page and each visited page according to the category information of the current page and the category information of each visited page, wherein the calculation formula of the first page correlation is as follows:

其中，d(p_i)表示页面p_i到一级页面类目(即根节点)的路径长度，L(p_i,p_j)表示页面p_i和p_j具有最大路径的公共祖先(比如，页面p1和页面p2，它俩的第一个公共父类是“类目11”，该类到根节点“类目root”的路径长度最大，所以“类目11”就是p1和p2的最大路径的公共祖先)。两个页面之间的第一页面相关度R₁(p_i,p_j)取值范围在[0,1]，当p_i＝p_j时，相关度最大且为1。Among them, d(pi) represents the path length from page pi to the first-level page category (ie root node), and L( _pi , p _j ) _represents the common ancestor of _{pages pi and p j} _with _the largest path (for example, The first common parent class of page p1 and page p2 is "category 11", and the path length from this class to the root node "category root" is the largest, so "category 11" is the maximum path of p1 and p2 common ancestor). The first page correlation degree R ₁ (pi , p _j ) between two pages has a value range of [0, 1], and when _pi _{=p j} _, the correlation degree is the largest and is 1.

步骤S10323：根据当前页面的标识信息和每个访问页面的标识信息，计算当前页面与每个访问页面之间的第二页面相关度，本发明实施例中两个页面之间的第二页面相关度的计算方法与两个用户之间的用户相关度的计算方法相同，在结合页面的标识信息(比如唯一标识、属性信息等)，通过普通相关度计算，获得第一页面相关度R₀(p_i,p_j)，此处不再累述。Step S10323: Calculate the second page correlation between the current page and each visited page according to the identification information of the current page and the identification information of each visited page. In the embodiment of the present invention, the second page correlation between the two pages is The calculation method of the degree is the same as the calculation method of the user correlation degree between two users. In combination with the identification information of the page (such as unique identification, attribute information, etc.), the first page correlation degree R ₀ ( p _i ,p _j ), which will not be repeated here.

步骤S10324：根据第一页面相关度和所述第二页面相关度，计算当前页面与每个访问页面之间的页面相关度。Step S10324: Calculate the page relevance between the current page and each visited page according to the first page relevance and the second page relevance.

在利用步骤S10322计算得到当前页面与每个访问页面之间的第一页面相关度R₁(p_i,p_j)，并利用步骤S10323计算得到当前页面与每个访问页面之间的第二页面相关度R₀(p_i,p_j)，就可以计算得到当前页面与每个访问页面之间的页面相关度R(p_i,p_j)＝R₀(p_i,p_j)*R₁(p_i,p_j)。In step S10322, the first page correlation degree R ₁ (p _i , p _j ) between the current page and each visited page is calculated, and the second page between the current page and each visited page is calculated in step S10323 Relevance R ₀ (pi ,p _j ), the page relevance R( _pi ,p _j ) ₌ R ₀ ( _pi ,p _j )*R ₁ between the current page and each visited page can be calculated ( _{pi ,p j} ₎ .

步骤S1033：根据当前页面与每个访问页面之间的页面相关度，并结合预设的页面相关度阈值，从访问页面列表中选择目标页面。Step S1033: According to the page relevance between the current page and each visited page, and in combination with a preset page relevance threshold, select a target page from the visited page list.

此处的页面相关度阈值可以是某个数值，比如说若当前页面与一个访问页面W的相关度大于这个数值，则说明访问页面W是目标页面，此处需要注意的是目标页面的个数至少为1。或者是页面相关度阈值是个数量，可以对访问页面列表中每个访问页面与当前页面的页面相关度进行排序，选择相关度较高的K个相关度对应的访问页面，这些较高的相关度对应的访问页面均为目标页面。The page relevance threshold here can be a certain value. For example, if the relevance between the current page and a visited page W is greater than this value, it means that the visited page W is the target page, and what needs to be noted here is the number of target pages. at least 1. Or the page relevancy threshold is a number, you can sort the page relevancy between each accessed page and the current page in the access page list, and select the access pages corresponding to the K relevancy degrees with higher relevancy. The corresponding access pages are all target pages.

在步骤S103中还提到，在获取到目标页面之后，需要获取目标页面对应的特征数据池。此处的特征数据池，可以看作是一个大的数据库，存储着每个页面对应的特征数据，并且每个页面对应至少一个特征数据，因为针对不同的用户，为这个用户推荐的特征数据是不相同的，因此就是两个用户打开了同一个页面的连接，那么为这两个用户推荐的特征数据也可以是不同的。It is also mentioned in step S103 that after acquiring the target page, the feature data pool corresponding to the target page needs to be acquired. The feature data pool here can be regarded as a large database, which stores the feature data corresponding to each page, and each page corresponds to at least one feature data, because for different users, the feature data recommended for this user is are not the same, so two users have opened a connection to the same page, and the feature data recommended for the two users can also be different.

因此，在步骤S104中，结合目标用户的画像信息，从特征数据池中确定与目标用户匹配的特征数据。比如访问页面p1时，男性对应的特征是D1，女性对应的特征数据是D2；访问页面p2时，男性对应的特征是D3，女性对应的特征数据是D4。假如当前目标用户是男性，那么他将获得p1页面的推荐数据D1，p2页面的推荐数据D3。Therefore, in step S104, the characteristic data matching the target user is determined from the characteristic data pool in combination with the portrait information of the target user. For example, when visiting page p1, the feature corresponding to male is D1, and the feature data corresponding to female is D2; when visiting page p2, the feature corresponding to male is D3, and the feature data corresponding to female is D4. If the current target user is male, he will obtain the recommendation data D1 of the p1 page and the recommendation data D3 of the p2 page.

本发明实施例中确定特征数据的目的是为了在目标用户打开目标页面时，可以快速获取到特征数据，加快页面的响应速度，提高页面的响应性能。因此，作为本发明实施例的又一个可参考实施例，在从特征数据池中确定与目标用户匹配的特征数据之后，本发明实施例的确定特征数据的方法还可以包括：缓存特征数据；接收目标用户访问目标页面的请求，直接加载缓存的特征数据。也就是将特征数据进行缓存，这样当该目标用户访问到目标页面时，可以快速的从缓存中获取该目标页面对应的特征数据。The purpose of determining the feature data in the embodiment of the present invention is to quickly obtain the feature data when the target user opens the target page, speed up the response speed of the page, and improve the response performance of the page. Therefore, as another reference embodiment of the embodiment of the present invention, after determining the feature data matching the target user from the feature data pool, the method for determining feature data in the embodiment of the present invention may further include: caching feature data; receiving The request of the target user to access the target page directly loads the cached feature data. That is, the feature data is cached, so that when the target user accesses the target page, the feature data corresponding to the target page can be quickly obtained from the cache.

需要注意的是，本发明实施例中确定的目标页面的个数是至少一个，因此最后确定的特征数据的个数也是至少一个。也就是说，利用本发明实施例的确定特征数据的方法，首先，获取用户页面访问关系数据库，利用目标用户的画像信息和数据库中用户集合以及用户集合中用户的画像信息，计算用户集合中用户与目标用户间的相关度，获取相关度较高的用户对应的访问页面列表；然后，通过页面的标签信息，计算当前页面与访问页面列表中每个页面的相关度，并将相关度最高的至少一个页面确定成目标页面；最后结合目标用户的画像信息，获取至少一个目标页面的特征数据，当目标用户访问到目标页面时，直接通过返回特征数据。It should be noted that the number of target pages determined in the embodiment of the present invention is at least one, and therefore the number of characteristic data finally determined is also at least one. That is to say, using the method for determining feature data in the embodiment of the present invention, first, obtain the user page access relational database, use the portrait information of the target user, the user set in the database, and the portrait information of the users in the user set to calculate the user in the user set. The correlation between the target user and the target user is to obtain the access page list corresponding to the user with a high degree of correlation; At least one page is determined as a target page; finally, the characteristic data of at least one target page is obtained in combination with the portrait information of the target user, and when the target user accesses the target page, the characteristic data is directly returned.

图3是根据本发明实施例的确定特征数据的方法的主要步骤的示意图。如图3所示，本发明实施例的确定特征数据的方法的主要步骤可以包括：FIG. 3 is a schematic diagram of main steps of a method for determining feature data according to an embodiment of the present invention. As shown in FIG. 3 , the main steps of the method for determining characteristic data according to the embodiment of the present invention may include:

步骤S301：获取当前页面的标签信息和访问当前页面的目标用户的画像信息；Step S301: Obtain the label information of the current page and the portrait information of the target user accessing the current page;

步骤S302：从用户页面访问数据库中获取用户集合，计算用户集合中每个用户与目标用户之间的用户相关度，其中用户相关度的计算方法在上述步骤S1022中已经具体解释，此处不在累述；Step S302: Obtain the user set from the user page access database, and calculate the user correlation between each user in the user set and the target user, wherein the calculation method of the user correlation has been specifically explained in the above step S1022, which is not exhausted here. stated;

步骤S303：根据目标用户与每个用户之间的用户相关度，并结合预设的用户相关度阈值，从用户集合中选择目标用户对应的相关用户列表，并获取相关用户列表对应的访问页面列表；Step S303: According to the user relevancy between the target user and each user, and in combination with a preset user relevancy threshold, select a relevant user list corresponding to the target user from the user set, and obtain the access page list corresponding to the relevant user list ;

步骤S304：计算访问页面列表中每个访问页面与当前页面之间的页面相关度，其中页面相关度的计算方法在上文步骤S10321至S10324中已经具体解释，此处不在累述；Step S304: Calculate the page correlation between each visited page and the current page in the visited page list, wherein the calculation method of the page correlation has been explained in detail in steps S10321 to S10324 above, and will not be described here;

步骤S305：根据当前页面与每个访问页面之间的页面相关度，并结合预设的页面相关度阈值，从访问页面列表中选择目标页面，并获取目标页面对应的特征数据池；Step S305: According to the page relevancy between the current page and each accessed page, combined with a preset page relevancy threshold, select the target page from the access page list, and obtain the feature data pool corresponding to the target page;

步骤S306：结合目标用户的画像信息，从特征数据池中确定与目标用户匹配的特征数据；Step S306: Combine the portrait information of the target user, determine the characteristic data matching the target user from the characteristic data pool;

步骤S307：缓存特征数据，在接收目标用户访问目标页面的请求时，直接加载缓存的特征数据。Step S307: Cache feature data, and directly load the cached feature data when receiving a request from the target user to access the target page.

需要注意的是，上述步骤S301至步骤S307中的目标页面的个数至少一个，而根据用户的画像信息，确定的与目标用户匹配的特征数据的个数与目标页面的个数相同。也就是说，针对一个目标页面，最终确定一个特征数据，当目标用户访问到某一个目标页面时，可以从缓存中直接获取该目标页面对应的特征数据。It should be noted that the number of target pages in the above steps S301 to S307 is at least one, and according to the user's portrait information, the number of feature data determined to match the target user is the same as the number of target pages. That is, for a target page, a characteristic data is finally determined, and when the target user accesses a certain target page, the characteristic data corresponding to the target page can be directly obtained from the cache.

根据本发明实施例的确定特征数据的技术方案，能够通过当前页面的标签信息和访问当前页面的目标用户的画像信息，确定该目标用户对应的目标页面，并获取目标页面的特征数据池，最后结合目标用户的画像信息从特征数据池中选择与目标用户匹配的特征数据，从而可以在目标用户访问到目标页面时，快速获取特征数据，替代了现有技术中实时调用特征数据接口，降低了特征数据的处理时间，达到提升页面的响应性能和降低响应时长的效果。此外，通过引入目标用户画像信息和当前页面的标签信息，基于目标用户的当前访问页面确定对应的目标页面，并预先获取目标页面的特征数据，相对于对亿级页面的全部特征数据进行缓存，大大节约了存储空间，降低成本。还有，通过引入用户画像信息和页面标签信息，在整个处理过程中提高确定目标页面的准确率，提高特征数据的命中率，进一步提高页面的响应性能。According to the technical solution for determining feature data in the embodiment of the present invention, the target page corresponding to the target user can be determined through the tag information of the current page and the portrait information of the target user accessing the current page, and the feature data pool of the target page can be obtained, and finally The feature data that matches the target user is selected from the feature data pool in combination with the target user's portrait information, so that the feature data can be quickly obtained when the target user accesses the target page, which replaces the real-time call feature data interface in the prior art and reduces the The processing time of the feature data can improve the response performance of the page and reduce the response time. In addition, by introducing the target user portrait information and the label information of the current page, the corresponding target page is determined based on the current page accessed by the target user, and the feature data of the target page is acquired in advance, compared to caching all the feature data of the billion-level page, Greatly saves storage space and reduces costs. In addition, by introducing user portrait information and page label information, the accuracy of determining the target page is improved in the entire processing process, the hit rate of the feature data is improved, and the response performance of the page is further improved.

图4是根据本发明实施例的确定特征数据的装置的主要模块的示意图。如图4所示，本发明实施例的确定特征数据的装置400主要包括以下模块：获取模块401、第一确定模块402、选择模块403和第二确定模块404。FIG. 4 is a schematic diagram of main modules of an apparatus for determining feature data according to an embodiment of the present invention. As shown in FIG. 4 , the apparatus 400 for determining feature data according to the embodiment of the present invention mainly includes the following modules: an acquisition module 401 , a first determination module 402 , a selection module 403 , and a second determination module 404 .

其中，获取模块401可用于获取当前页面的标签信息和访问当前页面的目标用户的画像信息；第一确定模块402可用于根据目标用户的画像信息，确定目标用户对应的相关用户列表，并获取相关用户列表对应的访问页面列表；选择模块403可用于根据当前页面的标签信息，从访问页面列表中选择目标页面，并获取目标页面对应的特征数据池；第二确定模块404可用于结合目标用户的画像信息，从特征数据池中确定与目标用户匹配的特征数据。Wherein, the obtaining module 401 can be used to obtain the label information of the current page and the portrait information of the target user accessing the current page; the first determining module 402 can be used to determine the relevant user list corresponding to the target user according to the portrait information of the target user, and obtain the relevant user list. The access page list corresponding to the user list; the selection module 403 can be used to select the target page from the access page list according to the label information of the current page, and obtain the feature data pool corresponding to the target page; the second determination module 404 can be used to combine the target user's Portrait information, from the feature data pool to determine the feature data that matches the target user.

本发明实施例中，第一确定模块402还可用于：从用户页面访问数据库中获取用户集合；针对用户集合中每个用户，获取每个用户的画像信息，并根据目标用户的画像信息和每个用户的画像信息，计算目标用户与每个用户之间的用户相关度；根据目标用户与每个用户之间的用户相关度，并结合预设的用户相关度阈值，从用户集合中选择目标用户对应的相关用户列表。In this embodiment of the present invention, the first determining module 402 may be further configured to: obtain a user set from the user page access database; for each user in the user set, obtain the portrait information of each user, and according to the portrait information of the target user and each user's portrait information According to the user relevancy between the target user and each user, combined with the preset user relevancy threshold, select the target from the user set A list of related users corresponding to the user.

本发明实施例中，选择模块403还可用于：针对访问页面列表中每个访问页面，获取每个访问页面的标签信息；根据当前页面的标签信息和每个访问页面的标签信息，计算当前页面与每个访问页面之间的页面相关度；根据当前页面与每个访问页面之间的页面相关度，并结合预设的页面相关度阈值，从访问页面列表中选择目标页面。In this embodiment of the present invention, the selection module 403 may be further configured to: for each access page in the access page list, obtain the tag information of each access page; calculate the current page according to the tag information of the current page and the tag information of each accessed page Page relevance between each visited page; according to the page relevance between the current page and each visited page, combined with a preset page relevance threshold, select the target page from the visited page list.

本发明实施例中，选择模块403还可用于：确定当前页面的标签信息中的类目信息和标识信息，以及每个访问页面的标签信息中的类目信息和标识信息；根据当前页面的类目信息和每个访问页面的类目信息，计算当前页面与每个访问页面之间的第一页面相关度；根据当前页面的标识信息和每个访问页面的标识信息，计算当前页面与每个访问页面之间的第二页面相关度；根据第一页面相关度和第二页面相关度，计算当前页面与每个访问页面之间的页面相关度。In this embodiment of the present invention, the selection module 403 may be further configured to: determine the category information and identification information in the tag information of the current page, and the category information and identification information in the tag information of each accessed page; according to the category information of the current page The first page correlation degree between the current page and each visited page is calculated; according to the identification information of the current page and the identification information of each visited page, the relationship between the current page and each visited page is calculated. The second page relevance between the visited pages; the page relevance between the current page and each visited page is calculated according to the first page relevance and the second page relevance.

本发明实施例中，装置还包括缓存模块(图中未示出)，该缓存模块可以用于：缓存特征数据；接收目标用户访问目标页面的请求，直接加载缓存的特征数据。In this embodiment of the present invention, the device further includes a cache module (not shown in the figure), which can be used to: cache feature data; receive a request from a target user to access a target page, and directly load the cached feature data.

从以上描述可以看出，本发明实施例的确定特征数据的装置能够通过当前页面的标签信息和访问当前页面的目标用户的画像信息，确定该目标用户对应的目标页面，并获取目标页面的特征数据池，最后结合目标用户的画像信息从特征数据池中选择与目标用户匹配的特征数据，从而可以在目标用户访问到目标页面时，快速获取特征数据，替代了现有技术中实时调用特征数据接口，降低了特征数据的处理时间，达到提升页面的响应性能和降低响应时长的效果。此外，通过引入目标用户画像信息和当前页面的标签信息，基于目标用户的当前访问页面确定对应的目标页面，并预先获取目标页面的特征数据，相对于对亿级页面的全部特征数据进行缓存，大大节约了存储空间，降低成本。还有，通过引入用户画像信息和页面标签信息，在整个处理过程中提高确定目标页面的准确率，提高特征数据的命中率，进一步提高页面的响应性能。As can be seen from the above description, the device for determining feature data according to the embodiment of the present invention can determine the target page corresponding to the target user through the label information of the current page and the portrait information of the target user accessing the current page, and obtain the features of the target page. Data pool, and finally select the feature data that matches the target user from the feature data pool in combination with the target user's portrait information, so that when the target user accesses the target page, the feature data can be quickly obtained, instead of calling the feature data in real time in the prior art. The interface reduces the processing time of feature data, and achieves the effect of improving the response performance of the page and reducing the response time. In addition, by introducing the target user portrait information and the label information of the current page, the corresponding target page is determined based on the current page accessed by the target user, and the feature data of the target page is acquired in advance, compared to caching all the feature data of the billion-level page, Greatly saves storage space and reduces costs. In addition, by introducing user portrait information and page label information, the accuracy of determining the target page is improved in the entire processing process, the hit rate of the feature data is improved, and the response performance of the page is further improved.

图5示出了可以应用本发明实施例的确定特征数据的方法或确定特征数据的装置的示例性系统架构500。FIG. 5 shows an exemplary system architecture 500 of a method for determining characteristic data or an apparatus for determining characteristic data to which embodiments of the present invention may be applied.

如图5所示，系统架构500可以包括终端设备501、502、503，网络504和服务器505。网络504用以在终端设备501、502、503和服务器505之间提供通信链路的介质。网络504可以包括各种连接类型，例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 5 , the system architecture 500 may include terminal devices 501 , 502 , and 503 , a network 504 and a server 505 . The network 504 is a medium used to provide a communication link between the terminal devices 501 , 502 , 503 and the server 505 . Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备501、502、503通过网络504与服务器505交互，以接收或发送消息等。终端设备501、502、503上可以安装有各种通讯客户端应用，例如购物类应用、网页浏览器应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等(仅为示例)。The user can use the terminal devices 501, 502, 503 to interact with the server 505 through the network 504 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 501 , 502 and 503 , such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, etc. (only examples).

终端设备501、502、503可以是具有显示屏并且支持网页浏览的各种电子设备，包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 501, 502, 503 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.

服务器505可以是提供各种服务的服务器，例如对用户利用终端设备501、502、503所浏览的购物类网站提供支持的后台管理服务器(仅为示例)。后台管理服务器可以对接收到的产品信息查询请求等数据进行分析等处理，并将处理结果(例如目标推送信息、产品信息--仅为示例)反馈给终端设备。The server 505 may be a server that provides various services, for example, a background management server that provides support for shopping websites browsed by the terminal devices 501 , 502 , and 503 (just an example). The background management server can analyze and process the received product information query request and other data, and feed back the processing results (such as target push information, product information—just an example) to the terminal device.

需要说明的是，本发明实施例所提供的确定特征数据的方法一般由服务器505执行，相应地，确定特征数据的装置一般设置于服务器505中。It should be noted that the method for determining characteristic data provided in the embodiment of the present invention is generally performed by the server 505 , and accordingly, the apparatus for determining characteristic data is generally set in the server 505 .

应该理解，图5中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 5 are only illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

下面参考图6，其示出了适于用来实现本发明实施例的终端设备的计算机系统600的结构示意图。图6示出的终端设备仅仅是一个示例，不应对本发明实施例的功能和使用范围带来任何限制。Referring to FIG. 6 below, it shows a schematic structural diagram of a computer system 600 suitable for implementing a terminal device according to an embodiment of the present invention. The terminal device shown in FIG. 6 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present invention.

如图6所示，计算机系统600包括中央处理单元(CPU)601，其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中，还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in FIG. 6, a computer system 600 includes a central processing unit (CPU) 601, which can be loaded into a random access memory (RAM) 603 according to a program stored in a read only memory (ROM) 602 or a program from a storage section 608 Instead, various appropriate actions and processes are performed. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601 , the ROM 602 , and the RAM 603 are connected to each other through a bus 604 . An input/output (I/O) interface 605 is also connected to bus 604 .

以下部件连接至I/O接口605：包括键盘、鼠标等的输入部分606；包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607；包括硬盘等的存储部分608；以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器610上，以便于从其上读出的计算机程序根据需要被安装入存储部分608。The following components are connected to the I/O interface 605: an input section 606 including a keyboard, a mouse, etc.; an output section 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 608 including a hard disk, etc. ; and a communication section 609 including a network interface card such as a LAN card, a modem, and the like. The communication section 609 performs communication processing via a network such as the Internet. A drive 610 is also connected to the I/O interface 605 as needed. A removable medium 611, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage section 608 as needed.

特别地，根据本发明公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本发明公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信部分609从网络上被下载和安装，和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时，执行本发明的系统中限定的上述功能。In particular, the processes described above with reference to the flowcharts may be implemented as computer software programs in accordance with the disclosed embodiments of the present invention. For example, embodiments disclosed herein include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 609 and/or installed from the removable medium 611 . When the computer program is executed by the central processing unit (CPU) 601, the above-described functions defined in the system of the present invention are performed.

需要说明的是，本发明所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本发明中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本发明中，计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：无线、电线、光缆、RF等等，或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the present invention may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In the present invention, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present invention, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

附图中的流程图和框图，图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图或流程图中的每个方框、以及框图或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented in special purpose hardware-based systems that perform the specified functions or operations, or can be implemented using A combination of dedicated hardware and computer instructions is implemented.

描述于本发明实施例中所涉及到的模块可以通过软件的方式实现，也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中，例如，可以描述为：一种处理器包括获取模块、第一确定模块、选择模块和第二确定模块。其中，这些模块的名称在某种情况下并不构成对该模块本身的限定，例如，获取模块还可以被描述为“获取当前页面的标签信息和访问所述当前页面的目标用户的画像信息的模块”。The modules involved in the embodiments of the present invention may be implemented in a software manner, and may also be implemented in a hardware manner. The described modules can also be provided in the processor, for example, it can be described as: a processor includes an acquisition module, a first determination module, a selection module and a second determination module. Among them, the names of these modules do not constitute a limitation of the module itself under certain circumstances. For example, the acquisition module can also be described as "obtaining the label information of the current page and the portrait information of the target user who accesses the current page. module".

作为另一方面，本发明还提供了一种计算机可读介质，该计算机可读介质可以是上述实施例中描述的设备中所包含的；也可以是单独存在，而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被一个该设备执行时，使得该设备包括：获取当前页面的标签信息和访问当前页面的目标用户的画像信息；根据目标用户的画像信息，确定目标用户对应的相关用户列表，并获取相关用户列表对应的访问页面列表；根据当前页面的标签信息，从访问页面列表中选择目标页面，并获取目标页面对应的特征数据池；结合目标用户的画像信息，从特征数据池中确定与目标用户匹配的特征数据。As another aspect, the present invention also provides a computer-readable medium, which may be included in the device described in the above embodiments; or may exist alone without being assembled into the device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by a device, the device includes: acquiring the label information of the current page and the portrait information of the target user accessing the current page; The portrait information of the target user, determine the relevant user list corresponding to the target user, and obtain the access page list corresponding to the relevant user list; according to the tag information of the current page, select the target page from the access page list, and obtain the characteristic data corresponding to the target page. Pool; combined with the target user's portrait information, determine the feature data matching the target user from the feature data pool.

根据本发明实施例的技术方案，能够通过当前页面的标签信息和访问当前页面的目标用户的画像信息，确定该目标用户对应的目标页面，并获取目标页面的特征数据池，最后结合目标用户的画像信息从特征数据池中选择与目标用户匹配的特征数据，从而可以在目标用户访问到目标页面时，快速获取特征数据，替代了现有技术中实时调用特征数据接口，降低了特征数据的处理时间，达到提升页面的响应性能和降低响应时长的效果。此外，通过引入目标用户画像信息和当前页面的标签信息，基于目标用户的当前访问页面确定对应的目标页面，并预先获取目标页面的特征数据，相对于对亿级页面的全部特征数据进行缓存，大大节约了存储空间，降低成本。还有，通过引入用户画像信息和页面标签信息，在整个处理过程中提高确定目标页面的准确率，提高特征数据的命中率，进一步提高页面的响应性能。According to the technical solution of the embodiment of the present invention, the target page corresponding to the target user can be determined through the tag information of the current page and the portrait information of the target user accessing the current page, and the feature data pool of the target page can be obtained, and finally combined with the target user's The portrait information selects the feature data that matches the target user from the feature data pool, so that the feature data can be quickly obtained when the target user accesses the target page, which replaces the real-time call feature data interface in the prior art and reduces the processing of feature data. time, to achieve the effect of improving the response performance of the page and reducing the response time. In addition, by introducing the target user portrait information and the label information of the current page, the corresponding target page is determined based on the current page accessed by the target user, and the feature data of the target page is acquired in advance, compared to caching all the feature data of the billion-level page, Greatly saves storage space and reduces costs. In addition, by introducing user portrait information and page label information, the accuracy of determining the target page is improved in the entire processing process, the hit rate of the feature data is improved, and the response performance of the page is further improved.

上述具体实施方式，并不构成对本发明保护范围的限制。本领域技术人员应该明白的是，取决于设计要求和其他因素，可以发生各种各样的修改、组合、子组合和替代。任何在本发明的精神和原则之内所作的修改、等同替换和改进等，均应包含在本发明保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims

1. a method for determining characteristic data, is characterized in that, comprises:

Obtain the label information of the current page and the portrait information of the target user accessing the current page;

According to the profile information of the target user, determine the relevant user list corresponding to the target user, and obtain the access page list corresponding to the relevant user list;

According to the label information of the current page, select a target page from the access page list, and obtain the feature data pool corresponding to the target page;

In combination with the profile information of the target user, characteristic data matching the target user is determined from the characteristic data pool.

2. The method according to claim 1, wherein determining the relevant user list corresponding to the target user according to the profile information of the target user, comprising:

Get the user collection from the user page access database;

For each user in the user set, obtain the profile information of each user, and calculate the target user and each user according to the profile information of the target user and the profile information of each user user relevance between

According to the user relevancy between the target user and each user, and in combination with a preset user relevancy threshold, a related user list corresponding to the target user is selected from the user set.

3. The method according to claim 1, wherein, according to the label information of the current page, selecting a target page from the access page list, comprising:

For each access page in the access page list, obtain the label information of each access page;

According to the label information of the current page and the label information of each visited page, calculate the page relevance between the current page and each visited page;

The target page is selected from the visited page list according to the page relevancy between the current page and each visited page and in combination with a preset page relevancy threshold.

4. The method according to claim 3, wherein, according to the label information of the current page and the label information of each visited page, the page between the current page and the each visited page is calculated Relevance, including:

Determine the category information and identification information in the tag information of the current page, and the category information and identification information in the tag information of each visited page;

According to the category information of the current page and the category information of each visited page, calculating the first page correlation between the current page and each of the visited pages;

According to the identification information of the current page and the identification information of each visited page, calculating a second page correlation between the current page and each of the visited pages;

According to the first page relevancy degree and the second page relevancy degree, the page relevancy degree between the current page and each visited page is calculated.

5. The method according to claim 1, wherein after determining the characteristic data matching the target user from the characteristic data pool, the method further comprises:

cache the feature data;

A request from the target user to access the target page is received, and the cached feature data is directly loaded.

6. A device for determining characteristic data, comprising:

an acquisition module for acquiring the label information of the current page and the portrait information of the target user accessing the current page;

a first determining module, configured to determine the relevant user list corresponding to the target user according to the profile information of the target user, and obtain the access page list corresponding to the relevant user list;

a selection module, configured to select a target page from the access page list according to the label information of the current page, and obtain a feature data pool corresponding to the target page;

The second determination module is configured to determine the characteristic data matching the target user from the characteristic data pool in combination with the portrait information of the target user.

7. The apparatus according to claim 6, wherein the first determining module is further configured to:

Get the user collection from the user page access database;

8. The apparatus according to claim 6, wherein the selection module is further configured to:

The target page is selected from the visited page list according to the page relevance between the current page and each of the visited pages and in combination with a preset page relevance threshold.

9. The apparatus according to claim 8, wherein the selection module is further configured to:

According to the category information of the current page and the category information of each visited page, calculating the first page relevance between the current page and each of the visited pages;

10. The apparatus according to claim 6, wherein the apparatus further comprises a cache module for:

cache the feature data;

11. An electronic device, characterized in that, comprising:

one or more processors;

storage means for storing one or more programs,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the method according to any one of claims 1-5 is implemented.