Disclosure of Invention
The application provides an information mining method and device and an information recommending method and device, which can improve the efficiency of information processing and the user experience.
The embodiment of the invention provides an information mining method, which comprises the following steps:
acquiring an edge activation vector in the heterogram;
optimizing the obtained edge activation vector at least according to the association degree between two nodes in the heterogram to obtain the edge activation vector with task characteristics;
And learning the graph embedded representation with specific task characteristics based on the node sequence generated by the random walk.
In one illustrative example, the acquiring the edge activation vector in the iso-graph includes:
Initializing k binary heterogeneous edge activation vectors according to different types of edges of the heterogeneous graph, wherein the value of k is related to the complexity degree of the heterogeneous graph, and the more the heterogeneous graph is, the more the types of the edges are, the larger the k value is.
In one illustrative example, the degree of association between two nodes in the heterogram is obtained by a fitness function that evaluates the appropriateness of different edge activation vectors in the heterogram.
In an exemplary embodiment, the optimizing the obtained edge activation vector according to the fitness function includes:
Respectively calculating the fitness function values corresponding to p different edge activation vectors on a specific task by using the fitness function, wherein each edge activation vector comprises k numerical values;
Proportionally selecting p/2 edge activation vector pairs according to the fitness function value;
performing intra-pair feature cross processing on the selected p/2 edge activation vector pairs, and then performing gene mutation processing to generate p new edge activation vectors;
and returning to the step of calculating the fitness function value, continuing to execute until convergence, and selecting the edge activation vector with the optimal yield as the edge activation vector with the task characteristics.
In one illustrative example, a graph algorithm is used to obtain the degree of association between two nodes in the heterogeneous graph.
The present application also provides a computer-readable storage medium storing computer-executable instructions for performing the information mining method of any one of the above.
The application also provides a device for realizing information mining, which comprises a memory and a processor, wherein the memory stores instructions executable by the processor for executing the steps of the information mining method.
The application also provides an information recommendation method, which comprises the following steps:
Acquiring an edge activation vector in the heterogram, and optimizing the acquired edge activation vector at least according to the association degree between two nodes in the heterogram to obtain an edge activation vector with task characteristics;
Performing random walk on the abnormal graph after the edge activation vector optimization to generate a node sequence required by graph embedding algorithm learning;
And constructing a user portrait according to the learned graph embedded representation of the specific task characteristics, and recommending information to the user according to the constructed user portrait.
In one illustrative example, the degree of association between two nodes in the heterogram is obtained by a fitness function that evaluates the appropriateness of different edge activation vectors in the heterogram.
In one exemplary embodiment, the optimizing the obtained edge activation vector at least according to the fitness function includes:
Calculating the fitness function values corresponding to p different edge activation vectors on a specific task by using the fitness function, wherein each edge activation vector comprises k numerical values, the value of k is related to the complexity degree of the heterogeneous graph, the more complex the heterogeneous graph is, the more the types of edges are, and the larger the k value is;
Proportionally selecting p/2 edge activation vector pairs according to the fitness function value;
Performing intra-pair feature cross processing on the selected p/2 edge activation vector pairs, and then performing gene mutation processing to generate k new edge activation vectors;
and returning to the step of calculating the fitness function value, continuing to execute until convergence, and selecting the edge activation vector with the optimal yield as the edge activation vector with the task characteristics.
The present application further provides a computer-readable storage medium storing computer-executable instructions for performing the information recommendation method of any one of the above.
The application further provides a device for realizing information recommendation, which comprises a memory and a processor, wherein the memory stores instructions executable by the processor for executing the steps of the information recommendation method.
The application simplifies the large-scale heterogeneous graph based on the edge activation vector, greatly compresses the scale of the heterogeneous graph, and solves the problems of low operation efficiency caused by overlarge scale of the heterogeneous graph, irrational weight of the artificial designated edge in the process of converting the heterogeneous graph into the isomorphic graph, introduction of invalid edges and the like. That is, by adopting the information mining method, the operation time is obviously reduced, and meanwhile, better effects are obtained compared with the related technology method, so that the user experience is improved.
According to the information recommendation method, the user characterization directly obtained from the heterogeneous graph fuses various behavior data information, so that more accurate information recommendation is ensured, and user experience is improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail hereinafter with reference to the accompanying drawings. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be arbitrarily combined with each other.
In one typical configuration of the application, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include non-transitory computer readable media (transmission media), such as modulated data signals and carrier waves.
The steps illustrated in the flowchart of the figures may be performed in a computer system, such as a set of computer-executable instructions. Also, while a logical order is depicted in the flowchart, in some cases, the steps depicted or described may be performed in a different order than presented herein.
FIG. 1 is a flow chart of the information mining method of the present application, as shown in FIG. 1, comprising:
step 100, obtaining an edge activation vector in the iso-graph.
In one illustrative example, the present step may include:
K binary heterogeneous edge activation vectors are initialized according to different types of edges of the iso-graph. For example, if there are 20 different types of edges on the iso-graph, then by initializing, k 20-dimensional vectors can be constructed, wherein each dimension in the vector has a value other than 0, i.e. 1, and the value 0 or 1 is selected randomly at the beginning.
Wherein the value of k is a variable parameter, and is related to the complexity of the isomerism graph, and the more the isomerism graph is, the more the types of edges are, the larger the value of k is.
And 101, optimizing the obtained edge activation vector at least according to the association degree between two nodes in the heterogram to obtain the edge activation vector with task characteristics.
In one illustrative example, the degree of association between two nodes in the heterogram may be obtained by a fitness function that evaluates the appropriateness of different edge activation vectors in the heterogram.
In an illustrative example, the fitness function is also used to evaluate how appropriate different edge activation vectors are in the iso-graph, and may include as shown in equation (1):
In the formula (1), two nodes on an iso-graph represented by i and j, z represents a common neighbor of the node i and the node j in the iso-graph, n and m represent specific types of edges, C is a candidate edge activation vector, E is a certain type of edge on the iso-graph, ω (E) is a weight of the edge type E, E C (v) represents a current valid edge set of the node v under the condition of a given edge activation vector C, and E C i,j represents a valid edge set between the node i and the node j under the condition of the given edge activation vector C.
The fitness function shown in equation (1) well characterizes the effective information retention between nodes. The corresponding fitness function values of p different edge activation vectors are calculated on a specific task, p/2 edge activation vector pairs are proportionally selected according to the fitness function values, and the probability of being selected is higher when the fitness is higher. Wherein each edge activation vector includes k values.
In one illustrative example, optimizing the obtained edge activation vector according to the fitness function in this step includes:
respectively calculating fitness function values corresponding to p different edge activation vectors on specific tasks, wherein each edge activation vector comprises k numerical values;
Proportionally selecting p/2 edge activation vector pairs according to the fitness function value;
And then, carrying out gene mutation treatment (technical term in genetic algorithm), namely reversing the numerical value of a certain position of the activation vector according to a certain probability to produce p new edge activation vectors.
And returning to the step of calculating the fitness function value, continuing to execute until convergence, such as the repetition number reaches a certain set value, and selecting the edge activation vector with the optimal yield (such as the edge activation vector with the maximum fitness function value) as the edge activation vector with the task characteristics.
In an exemplary embodiment, in the process of optimizing the obtained edge activation vector according to the fitness function, a specific task may be that a certain scene task is to recommend related articles to a user, and operation data (such as reading, praise and forwarding) of the articles by the user may be used as input of an algorithm under the task.
In an exemplary embodiment, the above process of optimizing the obtained edge activation vector according to the fitness function, proportionally selecting may include selecting the edge activation vector according to the fitness value calculated by the edge activation vector, for example, the higher the fitness, the greater the chance that the edge activation vector is selected, and so on.
In an exemplary embodiment, in the process of optimizing the obtained edge activation vector according to the fitness function, the certain probability may be a probability value set in advance by the user, and the certain position may be a position selected randomly. In an exemplary embodiment, the values of a certain position of the activation vectors are inverted with a certain probability, i.e. a position is randomly selected, and the values of the two activation vectors in this position are replaced with a fixed probability.
In one illustrative example, the present step may include:
the degree of association between two nodes in the iso-graph may be obtained using a graph algorithm in the related art, such as simrank.
In an illustrative example, in optimizing the obtained edge activation vector according to the graph algorithm in the related art in this step, the genetic algorithm is not used to learn the edge activation vector any more, and the edge activation vector is used as a part of training parameters in the graph algorithm to directly learn the edge activation vector by using the graph algorithm.
The method uses the edge activation vector to simplify and compress the large-scale heterogeneous graph, deletes some edges and nodes which are invalid with the subsequent task, and simplifies the scale of the graph to different degrees according to different activation vectors. Therefore, the subsequent simplified abnormal composition can be connected with the method for carrying out the graph embedding learning on the abnormal composition in the related technology, so that the operation efficiency is improved while the application effect is ensured.
Step 102, performing random walk on the abnormal graph after the edge activation vector optimization to generate a node sequence required by graph embedding algorithm learning, and learning graph embedding representation of specific task characteristics based on the node sequence generated by the random walk.
The specific implementation of this step may use the method provided in the related art, and the specific implementation is not used to limit the protection scope of the present application. It is emphasized here that the heterogeneous map at this time is a heterogeneous map simplified by the present application using the edge activation vector.
The application simplifies the large-scale heterogeneous graph based on the edge activation vector, greatly compresses the scale of the heterogeneous graph, and practical application proves that the total number of edges in the heterogeneous graph can be simplified by about 76 percent and the total number of nodes can be simplified by about 65 percent, and simultaneously, the application solves the problems of low operation efficiency on the heterogeneous graph directly caused by overlarge scale of the heterogeneous graph, irrational property of the artificial designated edge weight in the process of converting the heterogeneous graph into the homogeneous graph, introduction of invalid edges and the like. That is, by adopting the information mining method, the operation time is obviously reduced, and meanwhile, better effects are obtained compared with the related technology method, so that the user experience is improved.
The information mining method provided by the application can be used for realizing the subsequent sorting tasks based on the mined information, information recommendation and the like.
The present application also provides a computer-readable storage medium storing computer-executable instructions for performing the information mining method of any one of the above.
The application further provides a device for realizing information mining, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the computer program realizes the steps of any one of the information mining methods when being executed by the processor.
Fig. 2 is a flowchart of an information recommendation method according to the present application, as shown in fig. 2, including:
Step 200, obtaining an edge activation vector in the heterogram, and optimizing the obtained edge activation vector at least according to the association degree between two nodes in the heterogram to obtain the edge activation vector with task characteristics.
The specific implementation of this step is described in step 100 to step 101, and will not be described here again.
Step 201, performing random walk on the abnormal graph after the edge activation vector optimization to generate a node sequence required by graph embedding algorithm learning, and learning graph embedding representation of specific task characteristics based on the node sequence generated by the random walk.
And 202, constructing a user portrait according to the learned graph embedded representation of the specific task characteristics, and recommending information to the user according to the constructed user portrait.
According to the information recommendation method, the user characterization directly obtained from the heterogeneous graph fuses various behavior data information, so that more accurate information recommendation is ensured, and user experience is improved.
The present application also provides a computer-readable storage medium storing computer-executable instructions for performing the information recommendation method of any one of the above.
The application further provides a device for realizing information recommendation, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the computer program realizes the steps of any information recommendation method when being executed by the processor.
Although the embodiments of the present application are described above, the embodiments are only used for facilitating understanding of the present application, and are not intended to limit the present application. Any person skilled in the art can make any modification and variation in form and detail without departing from the spirit and scope of the present disclosure, but the scope of the present disclosure is to be determined by the appended claims.