CN118869264A - A method, device, medium and product for identifying abnormalities in user operation log data - Google Patents
A method, device, medium and product for identifying abnormalities in user operation log data Download PDFInfo
- Publication number
- CN118869264A CN118869264A CN202410857621.5A CN202410857621A CN118869264A CN 118869264 A CN118869264 A CN 118869264A CN 202410857621 A CN202410857621 A CN 202410857621A CN 118869264 A CN118869264 A CN 118869264A
- Authority
- CN
- China
- Prior art keywords
- fitness
- group
- individuals
- node
- individual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application discloses a method, a device, a medium and a product for identifying abnormal user operation log data, which are used for calculating mutual information among different nodes by taking different logs in the obtained user operation log data as network nodes; determining an undirected edge according to the mutual information between different nodes, constructing an undirected minimum supporting tree, orienting the nodes in the undirected minimum supporting tree, and determining a preliminary minimum supporting tree structure; performing node sequence search in the minimum support tree structure to obtain a node sequence group, taking different node sequences as different individuals in an initial population, and calculating the fitness of the different individuals in the initial population; updating the initial population according to the fitness of different individuals and a preset population optimizing strategy; and taking the node sequence of the individual with the greatest adaptability in the updated population as a user operation abnormal log. The scheme of the application can improve the optimizing efficiency and optimizing precision.
Description
Technical Field
The invention relates to the technical field of communication security, in particular to a method, a device, a medium and a product for identifying abnormal data of a user operation log.
Background
The traffic data volume in the communication enterprises is huge, data security risks are reserved under mass data, and in order to find potential and accumulated data security problems, abnormal operation data needs to be identified through a data mining analysis method, so that security management is realized. The Bayesian network is an important method in the field of data mining, can solve the problem of uncertainty, and can analyze abnormal operation logs generated under uncertain behaviors of users by using a Bayesian network structure learning algorithm.
However, the existing method for detecting abnormal operation logs of communication service is mainly characterized in that a text classification algorithm is improved or classification is carried out based on data statistics analysis, a part of key fields of user logs are selected based on established rules for analysis, but the logs generated under uncertain behaviors of users cannot be well mined and analyzed, the existing data is excessively relied on, the learning effect under a small data set is poor, and the optimizing precision and efficiency of the algorithm are low.
Disclosure of Invention
Compared with the prior art, the invention provides the method, the device, the medium and the product for identifying the abnormality of the user operation log data, which can improve the optimizing efficiency and optimizing precision.
The embodiment of the invention provides a method for identifying abnormal data of a user operation log, which comprises the following steps:
Taking different journals in the obtained user operation log data as network nodes, and calculating mutual information among different nodes;
determining an undirected edge according to the mutual information between different nodes, constructing an undirected minimum supporting tree, orienting the nodes in the undirected minimum supporting tree, and determining a preliminary minimum supporting tree structure;
performing node sequence search in the minimum support tree structure to obtain a node sequence group, taking different node sequences as different individuals in an initial population, and calculating the fitness of the different individuals in the initial population;
updating the initial population according to the fitness of different individuals and a preset population optimizing strategy;
And taking the node sequence of the individual with the greatest adaptability in the updated population as a user operation abnormal log.
Preferably, updating the initial population according to the fitness of different individuals and a preset population optimizing strategy includes:
Selecting a preset first number of individuals from the initial population as a hunting head group from large to small according to the fitness, and selecting a preset second number of individuals outside the hunting head group as a wandering group;
Calculating fitness of individuals in the game piece group and fitness of individuals in the wandering group;
Updating the individuals in the initial population according to the fitness of the individuals in the hunting group and the fitness of the individuals in the wandering group;
When the preset updating termination condition is not met, selecting a hunting group and a wandering group from the updated initial population, and updating the individuals in the initial population according to the fitness of the individuals in the hunting group and the fitness of the individuals in the wandering group until the updating termination condition is met;
And stopping the initial population updating when the updating termination condition is met.
Further, the updating the individuals in the initial population according to the fitness of the individuals in the hunting group and the fitness of the individuals in the wandering group comprises the following steps:
updating a first individual in the game-play group to the game-play group when the fitness of the first individual is greater than the fitness of a second individual in the game-play group;
And when the fitness of all individuals in the hunting group is not less than the fitness of a third individual in the wandering group, updating the third individual.
Preferably, the updating the individuals in the initial population according to the fitness of the individuals in the hunting group and the fitness of the individuals in the wandering group further includes:
selecting a preset third number of individuals from the initial population except for the hunting group and the wandering group as a group entering group;
calculating fitness of individuals in the access group;
And when the fitness of the fourth individual in the access group is larger than the fitness of the second individual in the hunting group, recalculating the fitness of the individuals in the initial group, and reselecting the first number of individuals from the initial group to the hunting group according to the fitness.
Preferably, the updating the individuals in the initial population according to the fitness of the individuals in the hunting group and the fitness of the individuals in the wandering group further includes:
Calculating an absolute value of a difference between fitness of a fourth individual in the intake group and fitness of a second individual in the hunter group;
And when the calculated absolute value is within a range interval determined according to the fitness of the fourth body and the fitness of the second body, selecting part of node fragments of the second body, replacing the mapping node fragments at the same position in the fourth body, and updating the fourth body.
Preferably, the updating the individuals in the initial population according to the fitness of the individuals in the hunting group and the fitness of the individuals in the wandering group further includes:
And when the nodes except the mapping node fragments in the fourth body are repeated with the nodes at the first positions in the partial node fragments, reserving the nodes corresponding to the first positions on the fourth body.
Preferably, the individual updating of the third individual comprises:
calculating dynamic variation probability according to a preset dynamic self-variation strategy;
determining the length of a variant node segment according to the dynamic variant probability correspondence;
Determining all mutation positions in the third body according to the mutation node segment length;
determining variant node fragments through node positioning;
Traversing and replacing each mutation position in the third individuals according to the mutation node segments to obtain a plurality of new mutation individuals;
And calculating the fitness of the variant new individuals in the wander group, and determining the variant new individuals with the largest fitness as updated third individuals.
Preferably, orienting nodes in the undirected minimum support tree to determine a preliminary minimum support tree structure includes:
Taking each element in a potential father node set of each node of the undirected minimum support tree as a father node of the node respectively, and constructing different substructures; calculating matching scores of different substructures and a preset standard training data set; taking the node with the highest matching score in the substructure of the node as the father node of the node;
and connecting each node of the undirected minimum support tree with a corresponding father node to obtain the minimum support tree structure.
The embodiment of the invention also provides a device for identifying the abnormality of the user operation log data, which comprises the following steps:
The mutual information calculation module is used for taking different journals in the acquired user operation log data as network nodes and calculating mutual information among different nodes;
The support tree determining module is used for determining undirected edges according to the mutual information sizes among different nodes, constructing undirected minimum support trees, orienting the nodes in the undirected minimum support trees and determining a preliminary minimum support tree structure;
the population determining module is used for searching node sequences in the minimum support tree structure to obtain node sequence groups, taking different node sequences as different individuals in the initial population, and calculating the fitness of the different individuals in the initial population;
The population updating module is used for updating the initial population according to the fitness of different individuals and a preset population optimizing strategy;
and the result output module is used for taking the node sequence of the individual with the greatest adaptability in the updated population as a user operation abnormal log.
Preferably, the population updating module is specifically configured to:
Selecting a preset first number of individuals from the initial population as a hunting head group from large to small according to the fitness, and selecting a preset second number of individuals outside the hunting head group as a wandering group;
Calculating fitness of individuals in the game piece group and fitness of individuals in the wandering group;
Updating the individuals in the initial population according to the fitness of the individuals in the hunting group and the fitness of the individuals in the wandering group;
When the preset updating termination condition is not met, selecting a hunting group and a wandering group from the updated initial population, and updating the individuals in the initial population according to the fitness of the individuals in the hunting group and the fitness of the individuals in the wandering group until the updating termination condition is met;
And stopping the initial population updating when the updating termination condition is met.
Preferably, the population updating module is specifically configured to:
updating a first individual in the game-play group to the game-play group when the fitness of the first individual is greater than the fitness of a second individual in the game-play group;
And when the fitness of all individuals in the hunting group is not less than the fitness of a third individual in the wandering group, updating the third individual.
Preferably, the population updating module is specifically configured to:
selecting a preset third number of individuals from the initial population except for the hunting group and the wandering group as a group entering group;
calculating fitness of individuals in the access group;
And when the fitness of the fourth individual in the access group is larger than the fitness of the second individual in the hunting group, recalculating the fitness of the individuals in the initial group, and reselecting the first number of individuals from the initial group to the hunting group according to the fitness.
Preferably, the population updating module is specifically configured to:
Calculating an absolute value of a difference between fitness of a fourth individual in the intake group and fitness of a second individual in the hunter group;
And when the calculated absolute value is within a range interval determined according to the fitness of the fourth body and the fitness of the second body, selecting part of node fragments of the second body, replacing the mapping node fragments at the same position in the fourth body, and updating the fourth body.
Preferably, the population updating module is specifically configured to:
And when the nodes except the mapping node fragments in the fourth body are repeated with the nodes at the first positions in the partial node fragments, reserving the nodes corresponding to the first positions on the fourth body.
Preferably, the population updating module is specifically configured to:
calculating dynamic variation probability according to a preset dynamic self-variation strategy;
determining the length of a variant node segment according to the dynamic variant probability correspondence;
Determining all mutation positions in the third body according to the mutation node segment length;
determining variant node fragments through node positioning;
Traversing and replacing each mutation position in the third individuals according to the mutation node segments to obtain a plurality of new mutation individuals;
And calculating the fitness of the variant new individuals in the wander group, and determining the variant new individuals with the largest fitness as updated third individuals.
Preferably, orienting nodes in the undirected minimum support tree to determine a preliminary minimum support tree structure includes:
Taking each element in a potential father node set of each node of the undirected minimum support tree as a father node of the node respectively, and constructing different substructures; calculating matching scores of different substructures and a preset standard training data set; taking the node with the highest matching score in the substructure of the node as the father node of the node;
and connecting each node of the undirected minimum support tree with a corresponding father node to obtain the minimum support tree structure.
The embodiment of the invention also provides a device for identifying the abnormality of the user operation log data, which comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor realizes the method for identifying the abnormality of the user operation log data according to any one of the embodiments when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, which comprises a stored computer program, wherein when the computer program runs, equipment where the computer readable storage medium is located is controlled to execute the method for identifying the abnormality of the user operation log data according to any one of the embodiments.
Embodiments of the present invention also provide a computer program product comprising a computer program/instruction which, when executed by a processor, implements the steps of the method of any of the embodiments described above.
Compared with the prior art, the application provides a method, a device, a medium and a product for identifying the abnormality of user operation log data, wherein different logs in the obtained user operation log data are used as network nodes, and mutual information among different nodes is calculated; determining an undirected edge according to the mutual information between different nodes, constructing an undirected minimum supporting tree, orienting the nodes in the undirected minimum supporting tree, and determining a preliminary minimum supporting tree structure; performing node sequence search in the minimum support tree structure to obtain a node sequence group, taking different node sequences as different individuals in an initial population, and calculating the fitness of the different individuals in the initial population; updating the initial population according to the fitness of different individuals and a preset population optimizing strategy; and taking the node sequence of the individual with the greatest adaptability in the updated population as a user operation abnormal log. The scheme of the application can improve the optimizing efficiency and optimizing precision.
Drawings
FIG. 1 is a schematic flow chart of a method for identifying anomalies in user operation log data according to an embodiment of the present invention;
FIG. 2 is another flow chart of a method for identifying anomalies in user operation log data according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the principle of eliminating abnormal nodes in individual variation update provided by the embodiment of the invention;
fig. 4 is a schematic structural diagram of a device for identifying abnormality of user operation log data according to an embodiment of the present invention;
Fig. 5 is another schematic structural diagram of a user operation log data anomaly identification device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the prior art, the abnormal operation log classification method based on the improved text classification algorithm mostly has the problems of low classification processing efficiency, inaccurate classification result and the like, and does not consider the relevance among each log of user operation in classification, but takes each piece of data as a single individual to perform characteristic calculation to realize classification. For example, log classification analysis is performed through a DBSCAN algorithm, when new log data is updated, all data need to be clustered again each time, a large amount of time is consumed, and analysis efficiency is low; the clustering result of the traditional K-means clustering algorithm has randomness, the results are different each time the calculation result is different because the central particles selected by the initial random are different, the algorithm needs to continuously classify and adjust the objects, and continuously calculates the new clustering central points after adjustment, so when the data volume is very large, the time expenditure of the algorithm is very large, and the processing requirement of massive communication service log data cannot be met.
Referring to fig. 1, a flow chart of a method for identifying anomalies in user operation log data according to an embodiment of the present invention is shown, where the method includes steps S1 to S5;
S1, taking different journals in the acquired user operation log data as network nodes, and calculating mutual information among different nodes;
s2, determining undirected edges according to mutual information sizes among different nodes, constructing an undirected minimum support tree, orienting nodes in the undirected minimum support tree, and determining a preliminary minimum support tree structure;
S3, searching node sequences in the minimum support tree structure to obtain node sequence groups, taking different node sequences as different individuals in an initial group, and calculating the fitness of the different individuals in the initial group;
s4, updating the initial population according to the fitness of different individuals and a preset population optimizing strategy;
and S5, taking the node sequence of the individual with the greatest adaptability in the updated population as a user operation abnormal log.
In the implementation of this embodiment, in the communication service, the log data of the user operation has continuity and no fixed rule or feature, and according to the continuous or intermittent continuous operation of the user, the number of log data sets constructed by the log data generated by the user operation with time change is updated.
When abnormal identification of user operation log data is carried out, the log formats generated by the different types of access resources and equipment of various services in an operator enterprise are different, so that unified analysis is difficult. Therefore, firstly, the user operation log data is normalized, and the normalized log data comprises fields such as organization attribution, user name, operation time, operation content, data sensitivity level and the like, wherein the data sensitivity level represents the importance degree of the user operation, and the higher the level is, the higher the importance degree of the log is.
And for a log data set generated by the operation of a certain service user X is C (X) = { l 1,l2,...ln }, and each log is regarded as a network node, the log data set is a Bayesian network node set, and the mutual information value between each log in C (X) is calculated according to a mutual information formula.
Bayesian networks, also known as belief networks, are extensions of Bayes' methods, and are one of the most effective theoretical models in the field of uncertain knowledge expression and reasoning at present. The Bayesian network is a directed acyclic graph, which is composed of representative variable nodes and directed edges connecting the nodes, expressing and analyzing uncertainty and probabilistic events. Common applications are decisions that are conditionally dependent on a variety of control factors, where inferences can be made from incomplete, inaccurate, or uncertain knowledge or information.
Mutual information is a useful information measure in information theory, which can be seen as the amount of information contained in one random variable about another random variable, or as the uncertainty that one random variable has been reduced by knowing another random variable.
For the exception log, it is generally different from the ordinary operation log, so the smaller the value calculated by mutual information is, the more obvious the exception feature of the log is. Traversing any node in the log data set C (X) and reserving an undirected edge with minimum mutual information to construct an undirected minimum support tree.
At this time, undirected edges are arranged between nodes in the minimum supporting tree structure, sequences among the nodes need to be determined, namely the nodes are oriented, so that the oriented minimum supporting tree structure is obtained, and the node sequence group is obtained and used as input of population optimization.
In the minimum support tree structure t 1, a starting vertex a in t 1 is randomly fetched in Xi Jiedian order, a directed edge b connecting the point is obtained, the edge is obtained to be connected to a next node a 1, nodes which are in path communication with the starting point are all accessed until no directed edge is obtained to indicate that the path search is completed, and a node order is obtained.
And searching the node sequence group obtained by searching the directional minimum support tree structure t 1, taking different node sequences as different individuals in the initial group, and calculating the fitness of the different individuals in the initial group.
As a preferred embodiment, the fitness of all the node sequences in G (l) is calculated by a node sequence scoring function FIT, and the calculated fitness isWherein i, j epsilon (1, n), T is the weight matrix.
It should be noted that, as one way of calculating the fitness, the node sequence scoring function FIT may be used to calculate the fitness in other embodiments.
And constructing an initial population by constructing a node sequence, updating individuals through a population optimizing strategy, and updating the initial population.
And finally obtaining an optimal node sequence by using the node sequence of the individual with the greatest adaptability in the updated population, wherein all nodes in the sequence are abnormal logs, and finally realizing the classification of the abnormal logs in the log group.
According to the scheme, each log generated by a user is used as a node in a Bayesian network structure, an initial population is constructed by constructing a node sequence, and an optimal solution can be quickly found with larger probability based on population algorithm learning, so that a global optimal solution is found, an abnormal operation log is determined, and the method has better global convergence and higher efficiency in a larger data structure.
In still another embodiment of the present invention, referring to fig. 2, another flow chart of the method for identifying anomalies in user operation log data provided in the embodiment of the present invention is shown. When the operation log abnormal data identification is carried out, the following steps are executed:
the logs are normalized, and because the log formats provided by different operators are different, unified analysis is difficult. Firstly, user operation log data is normalized.
And constructing a log data set according to the normalized log.
Mutual information of different logs in the log data set is calculated.
Determining a supporting tree structure and orientation, determining undirected edges according to mutual information among different nodes, constructing an undirected minimum supporting tree, orienting nodes in the undirected minimum supporting tree, and determining a preliminary minimum supporting tree structure.
An initial population G (l) is generated.
And when the initial population is updated according to the fitness of different individuals and a preset population optimizing strategy, dividing a hunting group and a wandering group from the initial population. The first n/2 node sequences are selected from the initial population G (l) as the hunting head group, n is the number of the initial population, and m node sequences outside the hunting head group are randomly selected as the wandering group.
And calculating the fitness of the individuals in the hunting group and the fitness of the individuals in the wandering group.
One individual l h、lw is selected from the hunter group and the wandering group, respectively.
Updating the individuals in the initial population according to the fitness of the individuals in the hunting group and the fitness of the individuals in the wandering group.
And outputting an updated population G (l)', and calculating FIT values of all individuals in the population.
When the preset updating termination condition is not met, selecting a hunting group and a wandering group from the updated initial population, and updating the individuals in the initial population according to the fitness of the individuals in the hunting group and the fitness of the individuals in the wandering group until the updating termination condition is met;
It should be noted that, as a preferred embodiment, the update termination condition may be set such that the user operation is not being updated. In other embodiments, the update termination condition may also be set to update the number of iterations.
The individuals obtained through the strategy updating form a new population G (l)', then all individual fitness values are calculated, and when the user operation is not updated any more, the iteration representing the algorithm reaches the maximum value, namely the algorithm is ended. And outputting a node sequence L max=L1,L2,...Ln with the maximum fitness value in the updated population G (L)' and enabling each node in the node sequence to correspond to the user operation exception log.
Based on log data generated by user operation, the proposal designs and builds a minimum support tree structure model, and simultaneously provides a method for scoring and orienting nodes by a substructure. And (3) corresponding each log in the user operation data set to a Bayesian network structure, regarding one log as a node, constructing a minimum support tree structure through mutual information calculation, and utilizing the sub-structure matching scoring orientation. By the method, the operation log with abnormal characteristics can be effectively detected under uncertain behaviors of the user, and the classification accuracy is improved.
In yet another embodiment of the present invention, updating the individuals in the initial population specifically includes:
and comparing the fitness of the individuals in the walking group with the fitness of the individuals in the hunting group, and carrying out individual updating by comparing the FIT values of other groups with the FIT values of the individuals in the hunting group.
When the fitness of the first individual in the ambulatory group is greater than the fitness of the second individual in the hunt group, i.e., FIT (l w) is better than FIT (l h), it is indicated that there is a better individual in the ambulatory group than in the hunt group, at which point the first individual l w is updated into the hunt group.
When the fitness of all individuals in the hunter group is not less than that of a third individual in the wandering group, the hunter group is indicated to be better, and the third individual is updated according to the population policy.
It should be noted that, when updating the population individuals, the population individual updating may also be performed by other population updating strategies or optimizing algorithms in the prior art, and this embodiment only provides a preferred implementation, and the present embodiment is not limited to the population individual updating scheme provided in this embodiment.
And optimizing according to an evolutionary strategy of the population algorithm to obtain a final result. The mining analysis of log data generated by the user under uncertain behaviors is realized, and the classification efficiency is not influenced by the size of the data set.
In yet another embodiment of the present invention, in the case of population updating by dividing hunting groups and wandering groups, individual updating is performed by adding intake groups, specifically:
K individuals are selected from the initial population G (l) except for the hunting group and the wandering group and used as a taking group, and the fitness of the individuals in the taking group is calculated.
Either body was selected and its fitness value was compared to a second body, l h, of FIT (l h) in the hunter group.
When the fitness of the fourth individual is greater than that of the second individual, namely if the fitness of l d is better than that of l h, updating the hunter group, namely recalculating the fitness of the individuals in the initial population, and reselecting the first number of individuals from the initial population to be the hunter group according to the fitness from the high to the low; otherwise, the hunter group is not updated.
And the population is updated by adding the access group, so that the optimizing efficiency in the population updating process is provided.
In another embodiment of the present invention, when updating the population, considering that only the hunter group may be locally optimized, a node segment variation method of the individual is provided to increase the global searching capability, specifically:
Calculating the absolute value f d,fd=|FIT(lh)-FIT(ld) of the difference between the fitness of the fourth individual l d in the intake group and the fitness of the second individual in the hunting group.
The calculated absolute value is updated for the fourth individual l d when f d is between FIT (l h) and FIT (l d) in the range interval determined according to the fitness FIT (l d) of the fourth individual and the fitness FIT (l h) of the second individual, and the preferential variation is performed between part of the gene segment of the second individual l h and the fourth individual l d.
And randomly selecting a section of mapping node segment in the fourth body l d as a variation region, selecting a corresponding partial node segment at the same position of the second body l h, and replacing the partial node segment to the variation region of l d to obtain an updated fourth body l d'.
Namely:
L d′=d1,d2,h3,h4,h5 is obtained.
The individual is mutated through the mutation strategy, so that the hunting head group is prevented from being updated to be in local optimum, and the global searching capability is improved.
In still another embodiment of the present invention, in the mutation update, it is required to ensure that there is no repeated node in the updated node sequence of the individual, that is, there is no repetition between the node in the node mutation segment and the node except for the replacement in the mutation update, and the updated l d' is the correct node sequence.
Repeated nodes are also possible to occur in the process of updating individual variation, so that repeated nodes need to be removed according to illegal node mapping detection in preferential variation.
Comparing the nodes except the mapping node segments in the fourth body with the nodes at the first positions in the partial node segments, see fig. 3, which is a schematic diagram of the principle of eliminating abnormal nodes in individual variation update provided by the embodiment of the invention.
The remaining nodes, i.e., d 1 and d 2, in the fourth node l d except the mapped node segments are mapped one-to-one with the partial node segments h 1、h2 and h 3 in the second node l h to compare whether duplicates exist.
And (3) discovering that the node h 3 and the node d 1 are repeated, then rejecting the node in part of the node fragments, and reserving a node d 3 corresponding to the position on the fourth body to obtain updated l d,ld=d1,d2,d3,h4,h5.
And the repeated nodes are removed according to illegal node mapping detection in the preferential variation, so that the accuracy of a final result is prevented from being influenced by the wrong node sequence.
In another embodiment of the present invention, for individual updating, a dynamic self-variation strategy is provided, which specifically includes the following steps:
Calculating dynamic variation probability P according to a dynamic self-variation strategy;
And determining the fragment length of the variation node by using the dynamic variation probability correspondence.
When determining the length of the variable node segment, the corresponding variable node segment length is matched through the dynamic variable probability and a preset corresponding relation table.
And determining the variant node fragments by node positioning according to the variant node fragment lengths.
Traversing and replacing each mutation position in the third body according to the mutation node segment, and traversing a plurality of mutation new bodies obtained from all mutation positions of the third body, wherein l w 1~lw n;
Calculating the fitness of the new variant individuals in the wander group, and taking the individual with the largest fitness value as the variant final updated individual to finally obtain an updated individual l w', namely:
lw′={lw|FIT(lw)}=max{FIT(lw 1),FIT(lw 2),...,FIT(lw n)};
and (3) providing a dynamic self-variation strategy to perform optimization on the population individuals to obtain an optimal result. The detection and classification of the abnormal logs are realized by a population optimizing method, and the proposed individual optimizing strategy improves the global convergence performance and the searching efficiency.
In yet another embodiment provided by the present invention, the following steps are specifically performed when determining a preliminary minimum support tree structure by node location:
For each node of the undirected minimum supporting tree, let a potential parent node set of a certain node l i in the minimum supporting tree structure t be M, respectively construct a child structure S i by taking all elements in the set as parent nodes of l i, and calculate and obtain the matching degree r of the child structure and the standard training data set D, wherein the calculation formula is as follows:
Where N is the number of nodes in the training dataset D, and m j is the j-th node in the training dataset D.
And taking the node in the highest scoring sub-structure as the father node of the node, and updating all nodes in the supporting tree structure t to obtain an updated father node set.
And connecting each node of the undirected minimum support tree with a corresponding father node to obtain the minimum support tree structure.
In yet another embodiment of the present invention, the mutual information is a statistic for measuring the correlation between two random variables, and since the log generated by the user operation belongs to the random variables, the mutual information can be used to calculate the degree of correlation between nodes. The mutual information calculation formula is specifically as follows:
wherein I (l i,lj) is mutual information between the ith node and the jth node, P (l i,lj) is joint probability between the ith node and the jth node, P (l i) and P (l j) are edge probability of the ith and the jth nodes respectively, I, j=1, 2 …, n and n are the number of nodes.
According to the scheme, the relevance among the logs generated by the user is considered, each log is used as a node in a Bayesian network structure, an initial population is constructed by constructing a node sequence, an individual is updated through a population optimizing strategy, an optimal node sequence is finally obtained, all the nodes in the sequence are abnormal logs, and finally the abnormal logs in the log group are classified. The operation log with abnormal characteristics can be effectively detected under the uncertain behavior of the user, and the classification accuracy is improved.
The detection and classification of the abnormal logs are realized by a population optimizing method, and the proposed individual optimizing strategy improves the global convergence performance and the searching efficiency. The present proposal can be used for the subsequent diagnosis of solving the system fault node detection.
Referring to fig. 4, a schematic structural diagram of a device for identifying abnormality of user operation log data according to an embodiment of the present invention is provided, where the device includes:
The mutual information calculation module is used for taking different journals in the acquired user operation log data as network nodes and calculating mutual information among different nodes;
The support tree determining module is used for determining undirected edges according to the mutual information sizes among different nodes, constructing undirected minimum support trees, orienting the nodes in the undirected minimum support trees and determining a preliminary minimum support tree structure;
the population determining module is used for searching node sequences in the minimum support tree structure to obtain node sequence groups, taking different node sequences as different individuals in the initial population, and calculating the fitness of the different individuals in the initial population;
The population updating module is used for updating the initial population according to the fitness of different individuals and a preset population optimizing strategy;
and the result output module is used for taking the node sequence of the individual with the greatest adaptability in the updated population as a user operation abnormal log.
It should be noted that, the device for identifying abnormal user operation log data provided in this embodiment can execute all the steps and functions of the method for identifying abnormal user operation log data provided in any one of the above embodiments, and specific functions of the device are not described herein.
Referring to fig. 5, another schematic structural diagram of a device for identifying abnormality of user operation log data according to an embodiment of the present invention is shown. The user operation log data anomaly identification device comprises: a processor, a memory, and a computer program stored in the memory and executable on the processor, such as a user operation log data anomaly identification program. The processor executes the computer program to implement the steps in the above embodiments of the method for identifying abnormal user operation log data, for example, steps S1 to S5 shown in fig. 1. Or the processor, when executing the computer program, performs the functions of the modules in the above apparatus embodiments.
The computer program may be divided into one or more modules, which are stored in the memory and executed by the processor to accomplish the present invention, for example. The one or more modules may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program in the one user operation log data anomaly identification device. For example, the computer program may be divided into several modules, and specific functions of each module are described in detail in the method for identifying abnormal user operation log data provided in any of the foregoing embodiments, and specific functions of the apparatus are not described herein.
The device for identifying the abnormal data of the user operation log can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The user operation log data abnormality identification device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a user operation log data anomaly identification device, and does not constitute a limitation of the user operation log data anomaly identification device, and may include more or less components than those illustrated, or may combine some components, or different components, for example, the user operation log data anomaly identification device may further include an input/output device, a network access device, a bus, and the like.
The Processor may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the one kind of user operation log data abnormality recognition device, and connects the respective parts of the entire one kind of user operation log data abnormality recognition device using various interfaces and lines.
The memory may be used to store the computer program and/or the module, and the processor may implement various functions of the apparatus by running or executing the computer program and/or the module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory card (SMART MEDIA CARD, SMC), secure Digital (SD) card, flash memory card (FLASH CARD), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Wherein the module integrated with the user operation log data abnormality recognition device can be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
Embodiments of the present invention also provide a computer program product comprising a computer program/instruction which, when executed by a processor, implements the steps of the method of any of the embodiments described above.
It should be noted that, the computer program product provided in this embodiment can execute all the steps and functions of the method for identifying abnormal user operation log data provided in any one of the above embodiments, and specific functions of the apparatus are not described herein.
It should be noted that modifications and adaptations to the invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410857621.5A CN118869264A (en) | 2024-06-28 | 2024-06-28 | A method, device, medium and product for identifying abnormalities in user operation log data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410857621.5A CN118869264A (en) | 2024-06-28 | 2024-06-28 | A method, device, medium and product for identifying abnormalities in user operation log data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118869264A true CN118869264A (en) | 2024-10-29 |
Family
ID=93172757
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410857621.5A Pending CN118869264A (en) | 2024-06-28 | 2024-06-28 | A method, device, medium and product for identifying abnormalities in user operation log data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118869264A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119728313A (en) * | 2025-03-03 | 2025-03-28 | 深圳市悦道科技有限公司 | A network security management method based on communication data processing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150106324A1 (en) * | 2013-10-11 | 2015-04-16 | Accenture Global Services Limited | Contextual graph matching based anomaly detection |
CN117391204A (en) * | 2023-10-24 | 2024-01-12 | 江南大学 | A hybrid Bayesian network structure learning method based on mutual information guidance |
CN118152962A (en) * | 2024-03-28 | 2024-06-07 | 国电南瑞南京控制系统有限公司 | A method and system for detecting abnormality in power monitoring operation data |
CN118171129A (en) * | 2024-05-11 | 2024-06-11 | 中移(苏州)软件技术有限公司 | User data acquisition method, system, electronic device, chip and medium |
-
2024
- 2024-06-28 CN CN202410857621.5A patent/CN118869264A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150106324A1 (en) * | 2013-10-11 | 2015-04-16 | Accenture Global Services Limited | Contextual graph matching based anomaly detection |
CN117391204A (en) * | 2023-10-24 | 2024-01-12 | 江南大学 | A hybrid Bayesian network structure learning method based on mutual information guidance |
CN118152962A (en) * | 2024-03-28 | 2024-06-07 | 国电南瑞南京控制系统有限公司 | A method and system for detecting abnormality in power monitoring operation data |
CN118171129A (en) * | 2024-05-11 | 2024-06-11 | 中移(苏州)软件技术有限公司 | User data acquisition method, system, electronic device, chip and medium |
Non-Patent Citations (1)
Title |
---|
苏昭玉: "改进贝叶斯网络算法及篦冷机故障诊断的研究", 《中国优秀硕士学位论文全文数据库工程科I辑》, 15 March 2022 (2022-03-15), pages 7 - 11 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119728313A (en) * | 2025-03-03 | 2025-03-28 | 深圳市悦道科技有限公司 | A network security management method based on communication data processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | Entity embedding-based anomaly detection for heterogeneous categorical events | |
US8280915B2 (en) | Binning predictors using per-predictor trees and MDL pruning | |
Messaoudi et al. | A multi-objective bat algorithm for community detection on dynamic social networks: I. Messaoudi and N. Kamel | |
CN113821657B (en) | Image processing model training method and image processing method based on artificial intelligence | |
CN104573130B (en) | The entity resolution method and device calculated based on colony | |
Meeus et al. | Achilles’ heels: vulnerable record identification in synthetic data publishing | |
CN112437053B (en) | Intrusion detection method and device | |
CN116737727B (en) | Stock transaction data column type storage method and server based on tree structure | |
Boytsov et al. | Learning to prune in metric and non-metric spaces | |
Singh et al. | Probabilistic data structure-based community detection and storage scheme in online social networks | |
CN118869264A (en) | A method, device, medium and product for identifying abnormalities in user operation log data | |
Chen et al. | Predicting user retweeting behavior in social networks with a novel ensemble learning approach | |
CN119150158A (en) | O2O platform user portrait construction method based on deep learning | |
Epasto et al. | Massively parallel and dynamic algorithms for minimum size clustering | |
Gias et al. | Samplehst: Efficient on-the-fly selection of distributed traces | |
WO2025147767A1 (en) | Apparatus and method for generating a path containing a user engagement target | |
US20160292300A1 (en) | System and method for fast network queries | |
CN114036345B (en) | A method, device and storage medium for processing trajectory data | |
Feng et al. | Web service QoS classification based on optimized convolutional neural network | |
US20220391734A1 (en) | Machine learning based dataset detection | |
CN116361677A (en) | Particle swarm fuzzy C-means clustering method based on differential privacy protection mechanism | |
Snir | On the number of genomic pacemakers: a geometric approach | |
Ranjan et al. | Automatic Data Clustering using Dynamic Crow Search Algorithm. | |
Gayathri et al. | A Novel Cuckoo Search with Levy Distribution-Optimized Density-Based Clustering Model on MapReduce for Big Data Environment | |
Xu et al. | Unsupervised entity resolution method based on random forest |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |