CN111782932B - Method, device and computer readable storage medium for establishing data association - Google Patents
Method, device and computer readable storage medium for establishing data association Download PDFInfo
- Publication number
- CN111782932B CN111782932B CN201911220575.3A CN201911220575A CN111782932B CN 111782932 B CN111782932 B CN 111782932B CN 201911220575 A CN201911220575 A CN 201911220575A CN 111782932 B CN111782932 B CN 111782932B
- Authority
- CN
- China
- Prior art keywords
- order
- characteristic data
- data
- feature data
- initial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Recommending goods or services
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present disclosure relates to a method, an apparatus, and a computer-readable storage medium for establishing data association, and relates to the technical field of data processing. The method comprises the following steps: combining the first-order characteristic data in the related description of the object to generate a plurality of high-order characteristic data; determining the level of each higher-order feature data according to the quantity of the first-order feature data contained in each higher-order feature data; determining each secondary characteristic data corresponding to each initial characteristic data by taking the highest-order characteristic data of each level as initial characteristic data, wherein each secondary characteristic data is lower than the corresponding initial characteristic data by one level, and each initial characteristic data comprises all first-order characteristic data in the corresponding secondary characteristic data; according to the initial characteristic data, establishing an association relation between the initial characteristic data and the corresponding secondary characteristic data; and repeating the establishment process of the association relation by taking each secondary characteristic data as new initial characteristic data until the initial grade characteristic data is updated into first-order characteristic data.
Description
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method for establishing a data association, an apparatus for establishing a data association, and a computer readable storage medium.
Background
With the increase of user demands, processing and analysis of mass data are often required. Establishing the association relationship between the data is beneficial to improving the efficiency of data processing and analysis. For example, the network platform may recommend item data having an association relationship to the item data browsed by the user based thereon. Such personalized recommendations are a key technical means to enhance the user experience.
In the related art, item data of interest to a user is extracted, and correlations among all item data are analyzed, so that an association relationship among the data is constructed.
Disclosure of Invention
The inventors of the present disclosure found that the above-described related art has the following problems: the association process has high complexity and large calculation amount, and the efficiency of data association is low.
In view of this, the present disclosure proposes a technical solution for establishing data association, which can improve efficiency of data association.
According to some embodiments of the present disclosure, there is provided a method for establishing a data association, including: combining the first-order feature data in the related description of the object to generate a plurality of high-order feature data, wherein the high-order feature data comprises a plurality of first-order feature data; determining the level of each high-order characteristic data according to the quantity of the first-order characteristic data contained in each high-order characteristic data, wherein the higher the level is, the more the quantity of the first-order characteristic data contained in the high-order characteristic data is; determining each secondary characteristic data corresponding to each initial characteristic data by taking the highest-order characteristic data of each level as initial characteristic data, wherein each secondary characteristic data is lower than the corresponding initial characteristic data by one level, and each initial characteristic data comprises all first-order characteristic data in the corresponding secondary characteristic data; according to the initial characteristic data, establishing an association relation between the initial characteristic data and the corresponding secondary characteristic data; and repeating the establishment process of the association relation by taking each secondary characteristic data as new initial characteristic data until the initial grade characteristic data is updated into first-order characteristic data.
In some embodiments, the method further comprises: counting the occurrence times of each first-order characteristic data and each higher-order characteristic data in the related description of each article similar to the article; adding corresponding first-order feature data or higher-order feature data into the candidate feature data set under the condition that the occurrence times are larger than the corresponding threshold value; judging whether each first-order characteristic data and each higher-order characteristic data belong to a candidate characteristic data set or not; and deleting the corresponding association relation under the condition that the first-order characteristic data or the higher-order characteristic data does not belong to the candidate characteristic data set.
In some embodiments, the respective threshold value of each first order feature data is greater than the respective threshold value of each higher order feature data, and the respective threshold value of the higher order feature data of a lower level is greater than the respective threshold value of the higher order feature data of a higher level.
In some embodiments, combining the first-order feature data in the related description of the item to generate a plurality of higher-order feature data includes: and combining the first-order characteristic data in the related description of the object according to a preset combination sequence to generate a plurality of high-order characteristic data.
In some embodiments, the method further comprises: the method comprises the steps of taking each first-order characteristic data and each high-order characteristic data as nodes, taking corresponding association relations as edges, and establishing a knowledge graph, wherein the edges in the knowledge graph are pointed to the high-order characteristic data by the first-order characteristic data or pointed to the high-order characteristic data by the low-level high-order characteristic data; and in response to the first-order feature data or the high-order feature data of the object searched by the user, recommending corresponding high-order feature data for the user according to the direction of the edge of the knowledge graph.
In some embodiments, the method further comprises: recommending the high-order feature data with an association relation with the first-order feature data to the user in response to the first-order feature data of the article searched by the user; and in response to the user searching the high-order feature data of the article, recommending the high-order feature data which has an association relation with the high-order feature data and has a higher level than the high-order feature data to the user.
According to further embodiments of the present disclosure, there is provided an apparatus for establishing a data association, including: the combination unit is used for combining the first-order characteristic data in the related description of the object to generate a plurality of high-order characteristic data, wherein the high-order characteristic data comprises a plurality of first-order characteristic data; a level determining unit, configured to determine a level of each higher-order feature data according to the number of first-order feature data included in each higher-order feature data, where the higher the level is, the greater the number of first-order feature data included in the higher-order feature data is; the feature determining unit is used for determining each secondary feature data corresponding to each initial feature data by taking the highest-order feature data of each level as initial feature data, wherein each secondary feature data is lower than the corresponding initial feature data by one level, and each initial feature data comprises all first-order feature data in the corresponding secondary feature data; the establishing unit is used for establishing the association relation between each initial characteristic data and each corresponding secondary characteristic data according to each initial characteristic data, and repeating the establishment process of the association relation by taking each secondary characteristic data as new initial characteristic data until the initial grade characteristic data is updated to be first-order characteristic data.
In some embodiments, the establishing unit counts the occurrence times of the first-order feature data and the higher-order feature data in the related description of each article similar to the article, adds the corresponding first-order feature data or higher-order feature data into the candidate feature data set under the condition that the occurrence times are larger than the corresponding threshold value, judges whether each first-order feature data and each higher-order feature data belong to the candidate feature data set, and deletes the corresponding association relationship under the condition that the first-order feature data or the higher-order feature data do not belong to the candidate feature data set.
In some embodiments, the respective threshold value of each first order feature data is greater than the respective threshold value of each higher order feature data, and the respective threshold value of the higher order feature data of a lower level is greater than the respective threshold value of the higher order feature data of a higher level.
In some embodiments, the combining unit combines the first-order feature data in the related description of the item according to a preset combination sequence to generate a plurality of high-order feature data.
In some embodiments, the building unit builds the knowledge graph by taking each first-order feature data and each higher-order feature data as nodes and corresponding association relations as edges. Edges in the knowledge graph are pointed to high-order feature data by first-order feature data or are pointed to high-order feature data by low-order high-order feature data.
In some embodiments, the apparatus further comprises a recommending unit for recommending corresponding high-order feature data to the user according to the direction of the edge of the knowledge-graph in response to the user searching for the first-order feature data or the high-order feature data of the item.
In some embodiments, the apparatus further includes a recommending unit that recommends higher-order feature data having an association relationship with first-order feature data of an item to a user in response to the user searching the first-order feature data; and in response to the user searching the high-order feature data of the article, recommending the high-order feature data which has an association relation with the high-order feature data and has a higher level than the high-order feature data to the user.
According to still further embodiments of the present disclosure, there is provided an apparatus for establishing a data association, including: a memory; and a processor coupled to the memory, the processor configured to perform the method of establishing a data association in any of the embodiments described above based on instructions stored in the memory device.
According to still further embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of establishing a data association in any of the embodiments described above.
In the above embodiment, the association relationship of the feature data of the article is established from the high-level high-order feature data including a large amount of the first-order feature data. The high-level high-order characteristic data contains the information of the low-level high-order characteristic data, so that searching in the whole characteristic data set is not needed in the process of constructing the association relation. Thus, the time complexity of the association can be reduced, and the data association efficiency can be improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 illustrates a flow chart of some embodiments of a method of establishing a data association of the present disclosure;
FIG. 2 illustrates a flow chart of further embodiments of a method of establishing a data association of the present disclosure;
FIG. 3 shows a schematic diagram of some embodiments of an established knowledge-graph of the present disclosure;
FIG. 4 illustrates a block diagram of some embodiments of a data association establishment apparatus of the present disclosure;
FIG. 5 illustrates a block diagram of further embodiments of a data association establishing apparatus of the present disclosure;
fig. 6 illustrates a block diagram of still further embodiments of a data association establishing apparatus of the present disclosure.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the authorization specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Fig. 1 illustrates a flow chart of some embodiments of a method of establishing a data association of the present disclosure.
As shown in fig. 1, the method includes: step 110, generating high-order characteristic data; step 120, determining the level of the high-order characteristic data; step 130, determining secondary characteristic data; step 140, establishing an association relationship; and step 150, updating the initial characteristic data.
In step 110, first order feature data from the associated descriptions of the item are combined to generate a plurality of higher order feature data. The high-order feature data includes a plurality of first-order feature data.
For example, the item is a brand mobile a, and the first-order feature data in the related description of the brand mobile a includes: double shot, full screen, waterproof. In this case, the higher-order feature data that may be generated for brand cell phone a includes: double-shot_full screen, full screen_waterproof, double-shot_waterproof, double-shot_waterproof_full screen.
For example, the article is a brand mobile B, and the first-order feature data in the related description of the brand mobile B includes: double-card double-standby, waterproof and double-shot. In this case, the higher-order feature data that may be generated for brand handset B includes: double-card double-standby waterproof, double-card double-standby double-shot, double-shot waterproof, double-card double-standby double-shot waterproof.
For example, the article is a brand mobile C, and the first-order feature data in the related description of the brand mobile C includes: double shot, 4G, full screen. In this case, the higher-order feature data that may be generated for brand cell phone C includes: double shot_full screen, double shot_4g, 4g_full screen, double shot_4g_full screen.
In some embodiments, the first-order feature data of the related descriptions of the item are combined according to a preset combination sequence to generate a plurality of high-order feature data. For example, the double-shot_full screen and full-screen_double-shot may be unified as a double-shot_full screen according to the combination order.
In step 120, the level of each higher-order feature data is determined according to the number of first-order feature data included in each higher-order feature data. The higher-order feature data of higher rank contains a larger number of first-order feature data.
For example, the high-order feature data including two first-order feature data such as double-shot_full-screen, full-screen_waterproof, double-shot_waterproof, and the like may be determined as the high-order feature data of level 2, that is, the second-order feature data; the high-order feature data including three first-order feature data, such as double shot_full screen_waterproof, is determined as the high-order feature data of level 3, i.e., the third-order feature data.
In step 130, the highest-order feature data of each level is taken as the initial feature data, and each secondary feature data corresponding to each initial feature data is determined. The secondary feature data is one level lower than the corresponding starting feature data, and each starting feature data contains all first-order feature data in the corresponding secondary feature data.
For example, with the third-order feature data double shot_full screen_waterproof as the revealing feature data, it may be determined that the secondary feature data includes double shot_full screen, full screen_waterproof, double shot_waterproof.
In step 140, an association between each initial feature data and each corresponding secondary feature data is established according to each initial feature data.
For example, the association relationship of the data double-shot_full-screen_waterproof and the double-shot_full-screen, full-screen_waterproof and double-shot_waterproof can be respectively established
In step 150, the steps 130 and 140 are repeated with each secondary feature data as new initial feature data until the initial level feature data is updated to first-order feature data.
For example, the steps 130 and 140 may be repeated by updating the double shot_full screen, full screen_waterproof, and double shot_waterproof as the start feature data and determining the first order feature data as the candidate secondary feature data.
Fig. 2 illustrates a flow chart of further embodiments of a method of establishing a data association of the present disclosure.
As shown in fig. 2, the method may further include: step 210, counting the occurrence times; step 220, establishing a candidate feature data set; step 230, judging whether the candidate feature data set belongs to the candidate feature data set; and step 240, deleting the corresponding association relation.
In step 210, the number of occurrences of each first-order feature data and each higher-order feature data is counted in the description related to each item of the same category as the item. For example, in the embodiment of branding handset A, B, C, the first order feature data double shot occurs 12 times, the double card double standby occurs 4 times, the second order feature data double shot_waterproof occurs 4 times, and the double card double standby_double shot occurs 2 times.
In step 220, in case the number of occurrences is greater than the respective threshold, the respective first-order feature data or higher-order feature data is added to the candidate feature data set.
In some embodiments, the respective threshold value of each first order feature data is greater than the respective threshold value of each higher order feature data, and the respective threshold value of the higher order feature data of a lower level is greater than the respective threshold value of the higher order feature data of a higher level.
For example, it may be set that the respective threshold value of the first-order feature data is 10, the respective threshold value of the second-order feature data is 3, and the respective threshold value of the third-order feature data is 1. In this case, the number of times of double-shot occurrence of the first-order feature data is greater than 10, the number of times of double-shot_waterproof occurrence of the second-order feature data is greater than 3, and the candidate feature data set includes double-shot and double-shot_waterproof; the number of times of appearance of the first-order characteristic data double-card double-standby is smaller than 10, the number of times of appearance of the second-order characteristic data double-card double-standby_double-shot is smaller than 3, and the candidate characteristic data set does not comprise double-card double-standby and double-shot_water resistance.
In step 230, it is determined whether each first-order feature data and each higher-order feature data belongs to a candidate feature data set.
In step 240, in the case that the first-order feature data or the higher-order feature data does not belong to the candidate feature data set, the corresponding association relationship is deleted.
For example, if the candidate feature data set does not include dual-card dual-standby and dual-shot_waterproof, then deleting all association relationships related to dual-card dual-standby and dual-shot_waterproof; the candidate feature data set includes double shot and double shot _ waterproof, then all associations relating to double shot and double shot _ waterproof are retained.
In some embodiments, in response to a user searching for first-order feature data of an item, high-order feature data having an association with the first-order feature data is recommended to the user; and in response to searching the high-order characteristic data of the article by the user, recommending the high-order characteristic data which has an association relation with the high-order characteristic data and has a higher level than the high-order characteristic data to the user.
For example, if the user searches for a mobile phone with a double-shot function through the platform, the mobile phone with the double-shot and waterproof function is recommended to the user according to the association relationship between the double-shot and the double-shot waterproof.
In some embodiments, the first-order feature data and the higher-order feature data are taken as nodes, and the corresponding association relationship is taken as an edge, so that a knowledge graph is established. Edges in the knowledge graph are pointed to high-order feature data by first-order feature data or are pointed to high-order feature data by low-order high-order feature data.
And in response to the user searching the first-order feature data or the high-order feature data of the object, recommending corresponding high-order feature data for the user according to the direction of the edge of the knowledge graph. For example, a knowledge graph as shown in fig. 3 may be established.
Fig. 3 shows a schematic diagram of some embodiments of the established knowledge-graph of the present disclosure.
As shown in fig. 3, according to the method in any of the above embodiments, an association relationship between the waterproof_double-shot_full screen of the third-order feature data and the waterproof_double-shot, waterproof_full screen and double-shot_full screen of the corresponding second-order feature data may be established; and the association relation between the second-order characteristic data and the corresponding first-order characteristic data, such as the association relation between the waterproof_binuclear and the waterproof and double-shot, can be established.
And establishing nodes in the knowledge graph according to the feature data, and establishing edges in the knowledge graph according to the association relation, wherein the edges are pointed to second-order feature data from third-order feature data or pointed to first-order feature data from second-order feature data. For example, a category may be pointed to all first-order feature data as the root node of the knowledge-graph; corresponding category information can also be added into the nodes, such as 'mobile phones' in the nodes in the figure.
In some embodiments, in response to a user searching for a waterproof handset, the corresponding node may be "waterproof: the mobile phone and the corresponding side direction of the mobile phone recommend waterproof double-shot mobile phones and waterproof full-screen mobile phones to users.
In some embodiments, the establishment method proposed by the present disclosure may include: combining the second-order concept and the third-order concept, generating a concept list, generating an edge candidate set, filtering the edge candidate set, encoding and outputting and the like.
In the step of combining the second-order and third-order concepts, first-order concepts (first-order feature data) are mainly combined to form a second-order and third-order combined concept (high-order feature data). This enriches the conceptual properties of the item, thereby generating a conceptual list of certain item words (e.g. cell phones).
In some embodiments, firstly, unreasonable concepts (feature data) may be filtered, such as concepts containing illegal characters; then, combining the high-order concepts, such as combining the first-order concepts, to generate second-order and third-order concepts; finally, the expressions of the same concepts can be unified by arranging the second-order concepts and the third-order concepts according to dictionary sequences and then outputting the same concepts. For example: the waterproof double-card double-standby waterproof and waterproof double-card double-standby are unified into waterproof double-card double-standby.
In the step of generating a concept list, performing de-duplication processing on all concepts to obtain an effective concept list; and simultaneously counting the occurrence times of each concept, and eliminating the concept with fewer occurrence times.
In some embodiments, first, the number of occurrences of each concept may be counted using a hash table; then, the concept with fewer occurrence times can be filtered according to the threshold corresponding to the threshold of each order concept; finally, the de-duplicated and filtered concept list is output.
For example, the probability of occurrence of the second-order concept and the third-order concept is low relative to the first-order concept, and thus, the respective thresholds of the respective order concepts may be set in such a manner that the coefficients are attenuated. For example, the corresponding threshold may be:
w j =k×(v j )
k is a super parameter set according to specific requirements and conditions, for example, the value of k is 0.5; v is an attenuation coefficient (such as v=0.2) which is more than 0 and less than 1, can be set according to specific requirements and conditions, and can also be obtained through machine learning; j is the order of the corresponding concept.
In the generating the edge candidate set step, a "bottom-up" (generation from high-level feature data to low-level feature data) generating method may be employed. For example, edges of third-order concepts and second-order concepts may be generated from third-order concepts; edges are generated from the second order concepts to the first order concepts.
The 'bottom-up' generation method is a linear generation method, and the calculated amount is low. This is because the third-order concept itself has information of the second-order concept, and when the association relationship is constructed "bottom-up", the information of the third-order concept can be used to avoid searching in the whole concept set, thereby reducing the time complexity.
In some embodiments, it may happen that the relevant second-order concepts of edges are not present in the concept list, and some of these edges are redundant and need to be filtered out.
For example, for each third-order concept, 3 edges from the third-order concept to the second-order concept are generated, such as "waterproof_full net through_double-card double-standby" to "waterproof_full net through", "waterproof_double-card double-standby", "full net through_double-card double-standby" (the third-order concepts themselves have been arranged in dictionary order, so the corresponding second-order concepts do not need to be arranged again); and generating 2 edges to the first-order concepts according to each second-order concept.
In the edge candidate set filtering step, edges not in the concept list are mainly deleted, so that edges between all legal concepts are finally generated.
In some embodiments, it may be determined for each edge whether the concept therein is in the concept list, and if not, these edges are deleted. For example, the nodes of each edge may be subjected to join operations with the concept list; if the join operation result of a certain node is null, which means that the node is not in the legal concept list, the whole edge is deleted.
Thus, since join operations require ordering, the temporal complexity of ordering is O (nlog n ) The temporal complexity of the filtering algorithm after ordering is O (n). Thus, the overall time complexity is O (nlog n ) Less than the time complexity O (n of the related art 2 )。
In the encoding output step, all concepts are encoded and then output in the foreground of the website. For example, the encoding can be performed by adopting a compression encoding mode, so that the memory consumption of the online service can be reduced.
In the above embodiment, the association relationship of the feature data of the article is established from the high-level high-order feature data including a large amount of the first-order feature data. The high-level high-order characteristic data contains the information of the low-level high-order characteristic data, so that searching in the whole characteristic data set is not needed in the process of constructing the association relation. Thus, the time complexity of the association can be reduced, and the data association efficiency can be improved.
Fig. 4 illustrates a block diagram of some embodiments of a data association establishing apparatus of the present disclosure.
As shown in fig. 4, the data-association establishing apparatus 4 includes a combining unit 41, a rank determining unit 42, a feature determining unit 43, and an establishing unit 44.
The combining unit 41 combines the first-order feature data in the related description of the article to generate a plurality of higher-order feature data. The high-order feature data includes a plurality of first-order feature data.
In some embodiments, the combining unit 41 combines the first-order feature data in the related description of the object according to a preset combination order to generate a plurality of high-order feature data.
The rank determination unit 42 determines the rank of each higher-order feature data based on the number of first-order feature data included in each higher-order feature data. The higher-order feature data of higher rank contains a larger number of first-order feature data.
The feature determination unit 43 determines each secondary feature data corresponding to each of the initial feature data with the highest-order feature data of each level as the initial feature data. The secondary feature data is one level lower than the corresponding starting feature data, and each starting feature data contains all first-order feature data in the corresponding secondary feature data.
The establishing unit 44 establishes association between each initial feature data and corresponding secondary feature data according to each initial feature data, and takes each secondary feature data as new initial feature data; repeating the establishment process of the association relation until the initial grade characteristic data is updated to first-order characteristic data.
In some embodiments, the establishing unit 44 counts the occurrence of each first-order feature data and each higher-order feature data in the associated description of each item of the same category as the item; adding corresponding first-order feature data or higher-order feature data into the candidate feature data set under the condition that the occurrence times are larger than the corresponding threshold value; judging whether each first-order characteristic data and each higher-order characteristic data belong to a candidate characteristic data set or not; and deleting the corresponding association relation under the condition that the first-order characteristic data or the higher-order characteristic data does not belong to the candidate characteristic data set.
In some embodiments, the respective threshold value of each first order feature data is greater than the respective threshold value of each higher order feature data, and the respective threshold value of the higher order feature data of a lower level is greater than the respective threshold value of the higher order feature data of a higher level.
In some embodiments, the establishing unit 44 establishes the knowledge graph with each first-order feature data and each higher-order feature data as nodes and with the corresponding association relationship as an edge. Edges in the knowledge graph are pointed to high-order feature data by first-order feature data or are pointed to high-order feature data by low-order high-order feature data.
In some embodiments, the establishing means 4 further comprises a recommending unit 45 for recommending the corresponding high-level feature data for the user according to the direction of the edges of the knowledge-graph in response to the user searching for the first-level feature data or the high-level feature data of the item.
In some embodiments, the recommending unit 45 recommends higher-order feature data having an association relationship with first-order feature data of an item to a user in response to the user searching the first-order feature data; and in response to the user searching the high-order feature data of the article, recommending the high-order feature data which has an association relation with the high-order feature data and has a higher level than the high-order feature data to the user.
In the above embodiment, the association relationship of the feature data of the article is established from the high-level high-order feature data including a large amount of the first-order feature data. The high-level high-order characteristic data contains the information of the low-level high-order characteristic data, so that searching in the whole characteristic data set is not needed in the process of constructing the association relation. Thus, the time complexity of the association can be reduced, and the data association efficiency can be improved.
Fig. 5 shows a block diagram of further embodiments of the data association establishing means of the present disclosure.
As shown in fig. 5, the data association establishing apparatus 5 of this embodiment includes: a memory 51 and a processor 52 coupled to the memory 51, the processor 52 being configured to perform the method of establishing a data association in any of the embodiments of the present disclosure based on instructions stored in the memory 51.
The memory 51 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, application programs, boot loader programs, databases, and other programs.
Fig. 6 illustrates a block diagram of still further embodiments of a data association establishing apparatus of the present disclosure.
As shown in fig. 6, the data association establishing apparatus 6 of this embodiment includes: a memory 610 and a processor 620 coupled to the memory 610, the processor 620 being configured to perform the method of establishing a data association in any of the foregoing embodiments based on instructions stored in the memory 610.
The memory 610 may include, for example, system memory, fixed nonvolatile storage media, and the like. The system memory stores, for example, an operating system, application programs, boot loader programs, and other programs.
The data association establishing means 6 may further comprise an input-output interface 630, a network interface 640, a storage interface 650, etc. These interfaces 630, 640, 650 and the memory 610 and processor 620 may be connected by, for example, a bus 660. The input/output interface 630 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. Network interface 640 provides a connection interface for various networking devices. The storage interface 650 provides a connection interface for external storage devices such as SD cards, U-discs, and the like.
It will be appreciated by those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media having computer-usable program code embodied therein.
Up to this point, the data association establishing method, the data association establishing apparatus, and the computer-readable storage medium according to the present disclosure have been described in detail. In order to avoid obscuring the concepts of the present disclosure, some details known in the art are not described. How to implement the solutions disclosed herein will be fully apparent to those skilled in the art from the above description.
The methods and systems of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the disclosure. The scope of the present disclosure is defined by the appended claims.
Claims (10)
1. A method of establishing a data association, comprising:
combining first-order characteristic data in the related description of the object to generate a plurality of high-order characteristic data, wherein the high-order characteristic data comprises a plurality of first-order characteristic data;
determining the level of each higher-order feature data according to the quantity of the first-order feature data contained in each higher-order feature data, wherein the higher the level is, the more the quantity of the first-order feature data contained in the higher-order feature data is;
determining corresponding secondary characteristic data of each initial characteristic data by taking highest-order characteristic data of each level as initial characteristic data, wherein each secondary characteristic data is one level lower than the corresponding initial characteristic data, and each initial characteristic data comprises all first-order characteristic data in the corresponding secondary characteristic data;
according to the initial characteristic data, establishing an association relation between the initial characteristic data and corresponding secondary characteristic data;
and repeating the establishment process of the association relation by taking the secondary characteristic data as new initial characteristic data until the initial grade characteristic data is updated into first-order characteristic data.
2. The setup method of claim 1, further comprising:
counting the occurrence times of each first-order characteristic data and each higher-order characteristic data in the related description of each article similar to the article;
adding corresponding first-order feature data or higher-order feature data into the candidate feature data set under the condition that the occurrence times are larger than corresponding threshold values;
judging whether each first-order characteristic data and each higher-order characteristic data belong to the candidate characteristic data set or not;
and deleting the corresponding association relation under the condition that the first-order characteristic data or the higher-order characteristic data does not belong to the candidate characteristic data set.
3. The method of claim 2, wherein,
the respective threshold value of each first-order feature data is greater than the respective threshold value of each higher-order feature data, and the respective threshold value of the higher-order feature data of a lower level is greater than the respective threshold value of the higher-order feature data of a higher level.
4. The method of claim 1, wherein the combining the first-order feature data in the related description of the item to generate the plurality of higher-order feature data comprises:
and combining the first-order characteristic data in the related description of the object according to a preset combination sequence to generate the plurality of high-order characteristic data.
5. The set-up method according to any one of claims 1-4, further comprising:
establishing a knowledge graph by taking each first-order characteristic data and each high-order characteristic data as nodes and corresponding association relations as edges, wherein the edges in the knowledge graph are pointed to the high-order characteristic data by the first-order characteristic data or pointed to the high-order characteristic data by the low-order high-order characteristic data;
and responding to the first-order characteristic data or the higher-order characteristic data of the object searched by the user, and recommending corresponding higher-order characteristic data for the user according to the direction of the edge of the knowledge graph.
6. The set-up method according to any one of claims 1-4, further comprising:
recommending high-order feature data with an association relation with the first-order feature data to a user in response to the user searching the first-order feature data of the article;
and in response to searching the high-order characteristic data of the article by a user, recommending the high-order characteristic data which has an association relation with the high-order characteristic data and has a higher level than the high-order characteristic data to the user.
7. A data association establishing apparatus, comprising:
the combination unit is used for combining the first-order characteristic data in the related description of the object to generate a plurality of high-order characteristic data, wherein the high-order characteristic data comprises a plurality of first-order characteristic data;
a level determining unit, configured to determine a level of each higher-order feature data according to the number of the first-order feature data included in each higher-order feature data, where the higher the level is, the more the number of the first-order feature data included in the higher-order feature data is;
the feature determining unit is used for determining each secondary feature data corresponding to each initial feature data by taking the highest-order feature data of each level as initial feature data, wherein each secondary feature data is one level lower than the corresponding initial feature data, and each initial feature data comprises all first-order feature data in each corresponding secondary feature data;
the establishing unit is used for establishing the association relation between each initial characteristic data and each corresponding secondary characteristic data according to each initial characteristic data, taking each secondary characteristic data as new initial characteristic data, and repeating the establishment process of the association relation until the initial grade characteristic data is updated into first-order characteristic data.
8. The setup apparatus according to claim 7, wherein,
the establishing unit counts the occurrence times of the first-order characteristic data and the higher-order characteristic data in the related description of the articles similar to the articles, adds the corresponding first-order characteristic data or higher-order characteristic data into a candidate characteristic data set under the condition that the occurrence times are larger than a corresponding threshold value, judges whether the first-order characteristic data and the higher-order characteristic data belong to the candidate characteristic data set or not, and deletes the corresponding association relation under the condition that the first-order characteristic data or the higher-order characteristic data do not belong to the candidate characteristic data set.
9. A data association establishing apparatus, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the method of establishing a data association of any of claims 1-6 based on instructions stored in the memory.
10. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of establishing a data association according to any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911220575.3A CN111782932B (en) | 2019-12-03 | 2019-12-03 | Method, device and computer readable storage medium for establishing data association |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911220575.3A CN111782932B (en) | 2019-12-03 | 2019-12-03 | Method, device and computer readable storage medium for establishing data association |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111782932A CN111782932A (en) | 2020-10-16 |
CN111782932B true CN111782932B (en) | 2023-12-05 |
Family
ID=72755298
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911220575.3A Active CN111782932B (en) | 2019-12-03 | 2019-12-03 | Method, device and computer readable storage medium for establishing data association |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111782932B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108510003A (en) * | 2018-03-30 | 2018-09-07 | 深圳广联赛讯有限公司 | Car networking big data air control assemblage characteristic extracting method, device and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9098575B2 (en) * | 2011-06-20 | 2015-08-04 | Primal Fusion Inc. | Preference-guided semantic processing |
US10706103B2 (en) * | 2018-01-30 | 2020-07-07 | Microsoft Technology Licensing, Llc | System and method for hierarchical distributed processing of large bipartite graphs |
-
2019
- 2019-12-03 CN CN201911220575.3A patent/CN111782932B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108510003A (en) * | 2018-03-30 | 2018-09-07 | 深圳广联赛讯有限公司 | Car networking big data air control assemblage characteristic extracting method, device and storage medium |
Non-Patent Citations (1)
Title |
---|
关联数据的语义动态发现及关联构建机制研究;成全;周兰芳;;情报科学(10);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111782932A (en) | 2020-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107526807B (en) | Information recommendation method and device | |
US9680830B2 (en) | Evaluating security of data access statements | |
CN109960761A (en) | Information recommendation method, device, equipment and computer readable storage medium | |
US20160232452A1 (en) | Method and device for recognizing spam short messages | |
CN103593442B (en) | The De-weight method and device of daily record data | |
CN105335368B (en) | A kind of product clustering method and device | |
US20190050672A1 (en) | INCREMENTAL AUTOMATIC UPDATE OF RANKED NEIGHBOR LISTS BASED ON k-th NEAREST NEIGHBORS | |
CN103870553A (en) | Input resource pushing method and system | |
CN107229674A (en) | A kind of data migration device, server and method | |
CN106803092B (en) | Method and device for determining standard problem data | |
CN109213972B (en) | Method, device, equipment and computer storage medium for determining document similarity | |
CN114690731B (en) | Associated scene recommendation method and device, storage medium and electronic device | |
CN111209741A (en) | Processing method and device of table data dictionary | |
US20210357955A1 (en) | User search category predictor | |
CN111782932B (en) | Method, device and computer readable storage medium for establishing data association | |
CN104217016B (en) | Webpage search keyword statistical method and device | |
CN112860626B (en) | A document sorting method, device and electronic equipment | |
CN105589683B (en) | Sample extraction method and device | |
CN118193801A (en) | Data retrieval method, index construction method and article retrieval method | |
CN107992526B (en) | Anchor recommendation method, storage device and computer device | |
CN110312166B (en) | Live broadcast room message filtering method and device, electronic equipment and storage medium | |
CN107248929B (en) | Strong correlation data generation method of multi-dimensional correlation data | |
CN110059243A (en) | Data optimization engine method, apparatus, equipment and computer readable storage medium | |
CN115795156A (en) | Material recall and neural network training method, device, equipment and storage medium | |
CN114117192B (en) | Object query method, device, server and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |