Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
The current detection scheme of abnormal behaviors of the internet mainstream mainly comprises three types of flow analysis, safety log management (Security Information EVENT MANAGEMENT, SIEM), product integration and the like. These three types of solutions have advantages and disadvantages, but in general, the security expert lists microscopic security event information (such as abnormal behavior events of the user) and summarizes associated wind control rules, and when the subsequent abnormal behavior is detected, the wind control rules are matched to detect the abnormal behavior of the user. In addition, abnormal behavior detection is also performed by multiple identity authentication (Multi-Factor Authentication, MFA), where multiple identity authentication refers to combined authentication by multiple factors such as password, gesture code, mobile phone short message, USB Key, fingerprint, facial features, etc.
However, with the rapid development of informatization of each industry, the method has the following defects that 1, the suitability and flexibility of a mode of management through an air control rule are poor, the preset air control rule is difficult to adapt to all users, especially users from different regions and devices, and meanwhile, when an attack model of a hacker changes, the air control rule is difficult to adjust in time, 2, the air control rule is difficult to reflect the comprehensive risk identification category, for example, for terminal devices (such as a PC (personal computer) and a mobile phone) which are not provided with identities but possibly have attack behaviors, risk identification dimension is not always counted when the air control rule is constructed, and 3, the forging cost of each authentication factor is low in a multiple identity authentication (MFA) mode, the leakage probability is also continuously increased along with the increasing number of times of using authentication factors, and the authentication process is relatively complicated, so that the user experience is seriously influenced, and the working efficiency of the user is reduced.
Therefore, the above method may cause inaccurate detection of abnormal behavior of the user.
In order to improve the accuracy of abnormal behavior detection, the application provides an abnormal behavior detection method, which comprises the steps of determining a real-time baseline of real-time user access data by using a pre-trained user behavior model, wherein the pre-trained user behavior model is obtained by training according to historical user access data, a user corresponding to the historical user access data and a user corresponding to the real-time user access data have the same user portrait, the real-time baseline is used for indicating the average deviation level of the real-time user access data relative to the historical user access data, and then carrying out abnormal behavior detection on the real-time user access data according to the real-time baseline to obtain an abnormal behavior detection result.
Fig. 1 shows a flow chart of an abnormal behavior detection method according to an embodiment of the present application, which is described in detail below:
S11, acquiring real-time user access data.
The real-time user access data refers to data generated in the process of accessing the service system by the user, and comprises one or more of user identity information, access equipment information, access behavior information, access service information and the like. The User identity information may include a name, an account number, a mobile phone number, etc., the access device information may include a device name, a device IP, a device User Agent (UA), etc., the access behavior information may include a login behavior, an authentication behavior, a browsing behavior, etc., and the access service information may include browsed service data, clicked service data, etc.
Specifically, when a user accesses a service system (such as an information system built and used for executing daily service operations in an enterprise) through a terminal device (such as a computer, a mobile phone and the like), real-time user access data generated in the user access process can be acquired through a data embedding point or a data pushing mode, so that the real-time user access data can be acquired in time.
And S12, determining a real-time access result of the real-time user access data by utilizing a pre-trained user behavior model, wherein the real-time access result at least comprises real-time access behaviors corresponding to the real-time user access data, and the pre-trained user behavior model is obtained by training according to historical user access data, wherein the user corresponding to the historical user access data and the user corresponding to the real-time user access data have the same user portrait.
The pre-trained user behavior model may be a machine learning model or a neural network model, and is used for outputting real-time access behaviors of the user. The user portraits are virtual user models constructed according to user access data, and can comprise user attribute data such as user characteristics (such as age, gender, account number, education level and the like) and user behaviors (such as login behaviors, access behaviors, viewing behaviors and the like), and a user with the same characteristics or behavior patterns can be found through the user portraits.
Specifically, after the real-time user access data is obtained, the pre-trained user behavior model can be matched according to the real-time user access data, and a real-time access result corresponding to the real-time user access data can be obtained according to the matched pre-trained user behavior model. The pre-trained user behavior model is obtained through training according to the historical user access data, and the user corresponding to the historical user access data and the user corresponding to the real-time user access data have the same user portrait, namely, the pre-trained user behavior model is obtained through training according to the access data of the user with the same user characteristics or behavior patterns, so that the pre-trained user behavior model can accurately output the real-time access behavior of the current user. The user access behavior can comprise various behavior relations generated in the user access process, and the behavior relations can comprise at least one of relations of a user and access equipment, a user and access IP, a user and access time, a user and access authority, a user and access region, a user and use time and the like. Of course, the real-time access result may further include a quantified value reflecting the real-time access behavior of the current user, for example, the quantified value may be a user behavior value, a real-time baseline of the user, and the like, which is not limited herein.
It should be noted that, in order to improve the accuracy of the pre-trained user behavior model prediction, the data preprocessing may be performed on the real-time user access data, including data cleaning, data classification, data fusion, and the like. The data sorting is used for sorting unquantifiable data according to actual conditions, for example, abnormal problems are encountered by a person in an enterprise, the later stage similar problems are sorted according to the difference degree between similar problems encountered by other persons in the enterprise and the abnormal problems, and the data fusion is used for fusing real-time user access data in different fields.
S13, detecting abnormal behaviors of the real-time user access data according to the real-time access result and the historical access result corresponding to the historical user access data, and obtaining an abnormal behavior detection result.
Specifically, whether the real-time access result is consistent with the historical access result or not can be compared, the deviation degree of the real-time user access data is judged according to the comparison result, if the deviation degree is smaller according to the comparison result, the abnormal behavior detection result is normal in behavior, otherwise, the abnormal behavior detection result is abnormal in behavior and alarming is conducted. Meanwhile, it should be noted that the above-mentioned historical access result may be determined according to a pre-trained user behavior model, and the above-mentioned historical access result is similar to the real-time access result, which is not described herein again.
The application detects abnormal behavior of the real-time user access data through the real-time access result and the historical user access data, the accuracy of abnormal behavior detection can be improved. Specifically, after the real-time user access data is obtained, a pre-trained user behavior model is utilized to determine a real-time access result of the real-time user access data, wherein the real-time access result at least comprises real-time access behaviors corresponding to the real-time user access data. The pre-trained user behavior model is obtained through training according to historical user access data, namely the pre-trained user behavior model does not need to be preset with a wind control rule, and therefore the flexibility of abnormal behavior detection can be improved. Meanwhile, the user corresponding to the historical user access data and the user corresponding to the real-time user access data have the same user portrait, which means that the pre-trained user behavior model can be more specifically adapted to the real-time user access data, so that the real-time access result can be more accurately determined through the pre-trained user behavior model, and further the accuracy of detecting abnormal behaviors of the user through the real-time access result and the historical access result corresponding to the historical user access data is improved.
In another optional embodiment of the present application, before determining the real-time access result of the real-time user access data by using the pre-trained user behavior model, the method further includes:
Acquiring the historical user access data in a preset time period, and constructing a knowledge graph based on the historical user access data in the preset time period, wherein the knowledge graph is used for indicating the corresponding relation between different historical user access data in the preset time period;
Determining historical behavior data according to the knowledge graph and preset behavior types, wherein the historical behavior data comprise user attribute data of different behavior types;
And constructing different types of user portraits according to the historical behavior data, and training a pre-constructed decision tree model by utilizing the different types of user portraits to obtain the pre-trained user behavior model corresponding to the different types of user portraits.
Specifically, historical user access data in a first preset time period can be obtained from a log of a historical record, then data preprocessing is carried out on the historical user access data, the relation between user attribute data and user attribute data in the preprocessed historical user access data is extracted, the knowledge graph is built according to the relation between the user attribute data and the user attribute data, the knowledge graph is further subjected to behavior division according to preset behavior types to obtain corresponding historical behavior data, multiple types of user portraits are built according to the historical behavior data, and a pre-built decision tree model is trained by utilizing the user portraits of different types to obtain user behavior models corresponding to the user portraits of different types. The above data preprocessing may refer to the above embodiments, and will not be described herein.
For example, the historical user access data in a first preset time period (for example, within a week) can be obtained from a structured and/or unstructured log of a historical record, and data preprocessing is performed, user attribute data (for example, a common region, an account number, equipment information, an access system, a user behavior, and the like) and an association relationship between the user attribute data in the preprocessed historical user access data (for example, when a certain device logs in a region a, an IP corresponding to the device is associated with the region a) can be extracted, the user attribute data is taken as an entity, the association relationship between the user attribute data is taken as a relationship edge to construct a corresponding knowledge graph, and then the user attribute data and the association relationship in the knowledge graph are summarized according to a preset behavior type to obtain historical behavior data corresponding to different behavior types (for example, the historical behavior data corresponding to the behavior type a comprises the user attribute data corresponding to the common region, the user equipment information, the access system, the behavior type, and the association relationship corresponding to the user attribute data). And finally, training corresponding user behavior models according to the user portraits constructed by the historical behavior data to obtain different types of user portraits, wherein the user portraits constructed by the historical behavior data comprise data reflecting user characteristics or behavior patterns, such as behavior types, user information, equipment information and the like.
According to the embodiment of the application, through knowledge graph and user portrait technology, historical access data can be better summarized, and training data (namely user portrait) truly reflecting user characteristics and behavior modes can be found, so that the training accuracy of a pre-constructed decision tree model is improved.
In an alternative embodiment of the present application, the pre-constructed decision tree model may be one of a random forest model, XGBoost, lightGBM model, etc., which is not limited herein.
Correspondingly, in the case of training a user behavior model through a user portrait, the determining a real-time access result of the real-time user access data by using the pre-trained user behavior model includes:
Determining a user portrait corresponding to the real-time user access data, and matching the pre-trained user behavior model according to the user portrait corresponding to the real-time user access data;
And taking the real-time user access data as the input of the matched pre-trained user behavior model, and outputting the real-time access result corresponding to the real-time user access data by utilizing the matched pre-trained user behavior model.
Specifically, the user portraits corresponding to the real-time user access data can be determined by constructing a knowledge graph, then the types of the user portraits are matched according to the user portraits corresponding to the real-time user access data, a corresponding pre-trained user behavior model is determined according to the matched user portraits types, and the real-time access results corresponding to the real-time user access data are output by utilizing the matched pre-trained user behavior model. The process of determining the user portrait by constructing the knowledge graph may refer to the above embodiment, which is not described herein.
In the embodiment of the application, the user images of the real-time user access data are used for matching the pre-trained user behavior model, so that the user images can be more accurately matched with the proper pre-trained user behavior model.
In some embodiments, the building of different types of user portraits based on the historical behavior data, the training of the pre-built decision tree model using different types of the user portraits, comprises:
Clustering the historical behavior data, and obtaining different types of user portraits according to a clustering result, wherein the user portraits comprise association relations between users and the user attribute data;
selecting user attribute data from the user portraits according to the association relation as a root node aiming at any type of the user portraits, calculating a histogram corresponding to the root node, determining splitting features of the root node and splitting points corresponding to the splitting features according to the histogram, splitting the root node according to the splitting features and the splitting points corresponding to the splitting features, and obtaining split leaf nodes;
And calculating a histogram corresponding to the leaf node for any one of the leaf nodes, determining a new splitting characteristic and a new splitting point according to the histogram corresponding to the leaf node, splitting the leaf node by using the new splitting characteristic and the new splitting point to obtain a new leaf node, returning to the step of calculating the histogram corresponding to the leaf node, and determining the new splitting characteristic and the new splitting point according to the histogram corresponding to the leaf node until a preset splitting condition is met.
Specifically, the historical behavior data can be clustered into cluster clusters of different cluster types through a preset clustering algorithm, and then user attribute data in the cluster clusters are associated with users to obtain user portraits under the cluster types, wherein the cluster types comprise one or more combinations of user types, region types, equipment types, enterprise types, behavior types and the like. Alternatively, the preset clustering algorithm may be one of a k-means clustering algorithm, a hierarchical clustering algorithm, a density clustering algorithm, and the like, which is not limited herein. For example, assuming that the clustering type is an enterprise type, clustering the historical behavior data to obtain user portraits of three enterprise types (enterprise type a, enterprise type B and enterprise type C), wherein the user portraits of each enterprise type comprise association relations between users and different user attribute data, and the association relations may include relations between account numbers and devices, account numbers and IP, account numbers and common regions, account numbers and user rights, account numbers and use time, and the like.
When training a pre-constructed decision tree model, firstly determining optimal characteristics (namely user attribute data) as root nodes according to a preset decision tree algorithm and the association relation between a user and user attributes, discretizing characteristic values of the characteristics in the root nodes into a fixed number (for example, k) of intervals (bins) to obtain histograms with the same width (for example, k), taking the discretized values in the intervals as indexes to obtain cumulative statistics of the histograms, searching split characteristics (namely new user attribute data) and split points of the user attribute data in the root nodes according to the cumulative statistics, taking the found split characteristics as leaf nodes and splitting according to the corresponding split points, then continuously determining new split characteristics and new split points according to the corresponding histograms for each leaf node, continuously iterating the process according to the new split characteristics and the new split points, determining decision trees according to the iterated root nodes and leaf nodes until the preset iteration times are met, and taking the iterated trees as the pre-trained user behavior model. Meanwhile, in the splitting process of the leaf nodes, a leaf priority growth strategy can be adopted, namely, for each split leaf node, one leaf node with the maximum splitting gain (generally, the maximum data volume corresponding to the splitting characteristic) is found according to the histogram to split, so that the calculation cost can be reduced. Alternatively, the predetermined decision tree algorithm may be a classification and regression tree algorithm (Classification and Regression Tree, CART).
In the embodiment of the application, the complexity and the calculation cost of the splitting calculation of the leaf nodes in the decision tree can be reduced through the histogram and the leaf priority growth strategy, so that the training efficiency of the decision tree model can be improved.
It should be further noted that, for the user behavior model after training, the user behavior model may be retrained at intervals (for example, one week), so that the user behavior model may be continuously updated, so that the user behavior model may be more fit to an actual service scenario.
In some embodiments, the history access result includes a history access behavior, and the performing abnormal behavior detection on the real-time user access data according to the real-time access result and a history access result corresponding to the history user access data to obtain an abnormal behavior detection result includes:
comparing whether the real-time access behavior is consistent with the historical access behavior, and determining the abnormal behavior detection result according to the comparison result.
Specifically, the last historical access behavior output by the user behavior model at the previous moment (or in a previous time period) can be used as a comparison standard, if the comparison between the real-time user behavior and the historical access behavior is less than n different comparison results, the abnormal behavior detection result can be determined to be normal, otherwise, the abnormal behavior detection result is determined to be abnormal and an alarm is given. Wherein n is a positive integer greater than or equal to 1.
For example, assuming that n is set to 2, after comparing the real-time access behavior and the history access behavior, it is determined that the login device and the login time of the user are inconsistent, it is possible to determine that the user is behaving abnormally and alarm.
In the embodiment of the application, the user access model is obtained according to the training of the same user portrait, so that the real-time access behavior and the historical access behavior are both ensured to come from the users of the same user portrait, and the historical access behavior output before the user behavior model is used as a comparison standard, so that the accuracy of abnormal behavior detection can be improved.
In another alternative embodiment of the present application, the pre-trained user behavior model may not be updated in time, because the user behavior may change suddenly due to the service requirement (e.g., the access region, access time, etc. change due to temporary business trip). Therefore, before the abnormal behavior detection is performed on the real-time user access data according to the real-time access result and the historical access result corresponding to the historical user access data, the method further comprises:
And carrying out model verification on the pre-trained user behavior model according to real-time verification data to obtain a real-time model verification result, wherein the real-time verification data comprises the real-time user access data, and the real-time model verification result is used for reflecting the performance of the pre-trained user behavior model.
Specifically, the real-time user access data can be selected, the historical user access data in a second preset time period before the real-time user access data is used as real-time verification data, the model verification is carried out on the user behavior model through the real-time verification data, and the output of the user behavior model is used as a real-time model verification result. The real-time verification data comprise real-time user access data at the current moment and historical user data in a second preset time period before the current moment, namely real service data closer to the current moment is adopted in user behavior model verification, so that the real-time model verification result can evaluate the model performance of the user behavior model at the current moment more accurately.
It should be noted that, in order to improve accuracy during user behavior model verification, the second preset time period for selecting the historical user access data during user behavior model verification may be less than or equal to the first preset time period for selecting the historical user access data during user behavior model training, so that the data volume during user behavior model verification may be improved, and inaccurate model performance verification caused by insufficient data volume is avoided. For example, historical user access data for the past week may be selected when the user behavior model is trained, and historical user access data for the past three days may be selected when the user behavior model is verified.
Optionally, the model verification includes one of Holdout tests, cross tests (e.g., K-fold cross tests), etc., without limitation.
In some embodiments, the real-time access result further includes a real-time baseline corresponding to the real-time user access behavior, where the real-time baseline is used to indicate an average level of the real-time access behavior corresponding to the real-time user access data;
correspondingly, the detecting abnormal behavior of the real-time user access data according to the real-time access result and the historical access result corresponding to the historical user access data includes:
determining the real-time baseline according to the real-time model verification result under the condition that the real-time model verification result indicates that the performance of the pre-trained user behavior model passes;
and detecting abnormal behaviors of the real-time user access data according to the real-time base line and a historical base line corresponding to the historical user access data, wherein the historical base line is used for indicating the average level of the historical access behaviors corresponding to the historical user access data.
Specifically, in the case that the real-time model verification result indicates that the performance of the pre-trained user behavior model passes, since the user access data in a more recent period can be used in the model verification process, a more accurate real-time baseline, that is, a quantized value reflecting the average level of the current user real-time access behavior can be obtained according to the real-time model verification result. At this time, the historical baseline at the previous moment can be obtained as a comparison standard, and abnormal behavior detection is performed on the real-time user access data according to the deviation degree of the real-time baseline and the historical baseline, wherein the higher the deviation degree of the real-time baseline and the historical baseline is, the greater the risk of abnormal behavior of the user is indicated.
It should be noted that, when the real-time model verification result indicates that the performance of the pre-trained user behavior model does not pass, an alarm may be performed, and the judgment may be performed manually.
In some embodiments, the performing model verification on the pre-trained user behavior model according to the real-time verification data to obtain a real-time model verification result includes:
Dividing the real-time verification data into k verification data sets, and instantiating the pre-trained user behavior model into k identical user behavior models, wherein k is a positive integer greater than 1;
for any user behavior model, sequentially taking 1 verification data set as a verification set, taking the rest k-1 verification data sets as training sets, training the user behavior model by using the training sets, and outputting a verification result corresponding to the verification set by using the trained user behavior model;
And determining the real-time model verification result according to k verification results.
Specifically, the real-time verification data are equally divided into k verification data sets, the obtained verification data sets are assumed to be Y1-Yk, and the pre-trained user behavior model is instantiated into k identical user behavior models, wherein k is more than or equal to 1. For the first user behavior model, taking the verification data set Y1 as a verification set and the verification data sets Y2-Yk as training sets, for the second user behavior model, taking the verification data set Y2 as a verification set, taking the verification data sets Y1, Y3-Yk as training sets, and so on, training the corresponding user behavior model by using the training sets, outputting verification results corresponding to the verification sets by using the trained user behavior model, and taking the average value of k verification results as a real-time model verification result.
In the embodiment of the application, the data volume during verification of the user behavior model can be increased and the accuracy of the verification result of the real-time model can be improved by expanding the real-time verification data into k verification data sets and respectively verifying k user behavior models.
In an alternative embodiment of the present application, when determining the real-time baseline according to the real-time verification result, the mean square error of k verification results may be calculated, and then a preset risk score formula is used to calculate a final score, where the final score is the real-time baseline. For example, the preset risk score formula may be: Where Score is the calculated real-time baseline and RMSE is the mean square error of k above-mentioned validation results. Since the k verification results are output results of the user behavior model, which can reflect the real-time access behavior corresponding to the user access data at a more recent moment, the real-time baseline is determined according to the mean square error of the k verification results, so that the average level of the real-time access behavior corresponding to the real-time user access data can be determined more accurately. The obtained real-time baseline may be used as a historical baseline for subsequent verification.
In some embodiments, the detecting abnormal behavior of the real-time user access data according to the real-time baseline and the historical baseline corresponding to the historical user access data includes:
Calculating a difference between the real-time baseline and the historical baseline;
and detecting abnormal behaviors of the real-time user access data according to a preset threshold range in which the difference value falls.
Specifically, the difference between the real-time baseline and the historical baseline is calculated, then a preset threshold range in which the difference falls is judged, and a corresponding abnormal behavior strategy is determined according to the preset threshold range in which the difference falls, wherein the number of the preset threshold ranges can be 1 or more than 1. In the case that the preset threshold range is 1, if the difference value is within the preset threshold range, indicating that no user abnormal behavior exists, and if the difference value is not within the preset threshold value range, indicating that the user abnormal behavior exists. In the case that the number of the preset threshold ranges is more than 1, each preset threshold range may be a continuous threshold interval, and different threshold intervals may correspond to different abnormal behavior detection results. For example, assuming that the preset threshold range is three consecutive threshold intervals, if the difference value falls within a first preset threshold range (e.g., [0,0.45 ]), it is determined that no user abnormal behavior exists, if the difference value falls within a second preset threshold range (e.g., [0.45,0.75 ]), it is determined that there is a possibility of user abnormal behavior, a second verification is performed, that is, a real-time baseline of real-time user access data is determined by reusing a pre-trained user behavior model, and a subsequent judgment is performed, and if the difference value falls within a third preset threshold range (e.g., [0.75,1 ]), it is determined that there is user abnormal behavior, and an alarm is performed.
In the embodiment of the application, different abnormal behavior strategies can be determined through the preset threshold range in which the difference value between the real-time baseline and the historical baseline falls, so that the flexibility of abnormal behavior detection is improved.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
Corresponding to the abnormal behavior detection method described in the above embodiments, fig. 2 shows a schematic structural diagram of the abnormal behavior detection device provided in the embodiment of the present application, and for convenience of explanation, only the portion related to the embodiment of the present application is shown.
Referring to fig. 2, the apparatus may be an abnormal behavior detection apparatus 21, and the abnormal behavior detection apparatus 21 may include a real-time data acquisition module 211, a real-time result determination module 212, and an abnormal behavior detection module 213.
Referring to fig. 2, the abnormal behavior detection apparatus 21 includes:
The real-time data acquisition module 211 is configured to acquire real-time user access data;
The real-time result determining module 212 is configured to determine a real-time access result of the real-time user access data by using a pre-trained user behavior model, where the real-time access result at least includes a real-time access behavior corresponding to the real-time user access data, and the pre-trained user behavior model is obtained by training historical user access data, and a user corresponding to the historical user access data and a user corresponding to the real-time user access data have the same user portrait;
the abnormal behavior detection module 213 is configured to perform abnormal behavior detection on the real-time user access data according to the real-time access result and a historical access result corresponding to the historical user access data, so as to obtain an abnormal behavior detection result.
In another alternative embodiment of the present application, the abnormal behavior detection apparatus 21 further includes a model training module, where before the model training module is configured to determine the real-time access result of the real-time user access data by using the pre-trained user behavior model, the model training module includes:
Acquiring the historical user access data in a preset time period, and constructing a knowledge graph based on the historical user access data in the preset time period, wherein the knowledge graph is used for indicating the corresponding relation between different historical user access data in the preset time period;
Determining historical behavior data according to the knowledge graph and preset behavior types, wherein the historical behavior data comprise user attribute data of different behavior types;
And constructing different types of user portraits according to the historical behavior data, and training a pre-constructed decision tree model by utilizing the different types of user portraits to obtain the pre-trained user behavior model corresponding to the different types of user portraits.
Correspondingly, in the case of training a user behavior model through a user representation, the real-time result determination module 212, when determining a real-time access result of the real-time user access data using a pre-trained user behavior model, includes:
Determining a user portrait corresponding to the real-time user access data, and matching the pre-trained user behavior model according to the user portrait corresponding to the real-time user access data;
And taking the real-time user access data as the input of the matched pre-trained user behavior model, and outputting the real-time access result corresponding to the real-time user access data by utilizing the matched pre-trained user behavior model.
In some embodiments, the model training module, when constructing different types of user portraits according to the historical behavior data and training a pre-constructed decision tree model by using the different types of user portraits, comprises:
Clustering the historical behavior data, and obtaining different types of user portraits according to a clustering result, wherein the user portraits comprise association relations between users and the user attribute data;
selecting user attribute data from the user portraits according to the association relation as a root node aiming at any type of the user portraits, calculating a histogram corresponding to the root node, determining splitting features of the root node and splitting points corresponding to the splitting features according to the histogram, splitting the root node according to the splitting features and the splitting points corresponding to the splitting features, and obtaining split leaf nodes;
And calculating a histogram corresponding to the leaf node for any one of the leaf nodes, determining a new splitting characteristic and a new splitting point according to the histogram corresponding to the leaf node, splitting the leaf node by using the new splitting characteristic and the new splitting point to obtain a new leaf node, returning to the step of calculating the histogram corresponding to the leaf node, and determining the new splitting characteristic and the new splitting point according to the histogram corresponding to the leaf node until a preset splitting condition is met.
In some embodiments, the historical access result includes a historical access behavior, and the abnormal behavior detection module 213 includes, when performing abnormal behavior detection on the real-time user access data according to the real-time access result and a historical access result corresponding to the historical user access data, obtaining an abnormal behavior detection result:
comparing whether the real-time access behavior is consistent with the historical access behavior, and determining the abnormal behavior detection result according to the comparison result.
In another alternative embodiment of the present application, since the training of the user behavior model is performed by selecting the historical user access data before the real-time user access data each time the user behavior model is trained, the abnormal behavior detection apparatus 21 further includes a model verification module, where the model verification module is configured to, before performing the abnormal behavior detection on the real-time user access data according to the real-time baseline, include:
And carrying out model verification on the pre-trained user behavior model according to real-time verification data to obtain a real-time model verification result, wherein the real-time verification data comprises the real-time user access data, and the real-time model verification result is used for reflecting the performance of the pre-trained user behavior model.
Correspondingly, the real-time access result further comprises a real-time baseline corresponding to the real-time user access behavior, the real-time baseline is used for indicating the average level of the real-time access behavior corresponding to the real-time user access data, and the abnormal behavior detection module 213 comprises:
determining the real-time baseline according to the real-time model verification result under the condition that the real-time model verification result indicates that the performance of the pre-trained user behavior model passes;
and detecting abnormal behaviors of the real-time user access data according to the real-time base line and a historical base line corresponding to the historical user access data, wherein the historical base line is used for indicating the average level of the historical access behaviors corresponding to the historical user access data.
In some embodiments, in order to further improve accuracy in user behavior model verification, the model verification module performs model verification on the pre-trained user behavior model according to real-time verification data, and when obtaining a real-time model verification result, the model verification module includes:
Dividing the real-time verification data into k verification data sets, and instantiating the pre-trained user behavior model into k identical user behavior models, wherein k is a positive integer greater than 1;
for any user behavior model, sequentially taking 1 verification data set as a verification set, taking the rest k-1 verification data sets as training sets, training the user behavior model by using the training sets, and outputting a verification result corresponding to the verification set by using the trained user behavior model;
And determining the real-time model verification result according to k verification results.
In some embodiments, the abnormal behavior detection module 213, when detecting abnormal behavior of the real-time user access data according to the real-time baseline and the historical baseline corresponding to the historical user access data, includes:
Calculating a difference between the real-time baseline and the historical baseline;
and detecting abnormal behaviors of the real-time user access data according to a preset threshold range in which the difference value falls.
It should be noted that, because the content of information interaction and execution process between the devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the application. As shown in fig. 3, the electronic device 3 of this embodiment comprises at least one processor 30 (only one shown in fig. 3), a memory 31 and a computer program 32 stored in said memory 31 and executable on said at least one processor 30. The steps of any of the various method embodiments are performed by the processor 30 when executing the computer program 32.
The electronic device 3 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The electronic device may include, but is not limited to, a processor 30, a memory 31. It will be appreciated by those skilled in the art that fig. 3 is merely an example of the electronic device 3 and does not constitute a limitation of the electronic device 3, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device may further include an input transmitting device, a network access device, a bus, etc.
The Processor 30 may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 31 may in some embodiments be an internal storage unit of the electronic device 3, such as a hard disk or a memory of the electronic device 3. The memory 31 may also be an external storage device of the electronic device 3, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the electronic device 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the electronic device 3. The memory 31 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs etc., such as program codes of the computer program etc. The memory 31 may also be used for temporarily storing data that has been transmitted or is to be transmitted.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the division of the functional units and modules is illustrated, and in practical application, the functional distribution may be performed by different functional units and modules, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
The embodiment of the application also provides a network device, which comprises at least one processor, a memory and a computer program stored in the memory and capable of running on the at least one processor, wherein the steps in any of the various method embodiments are realized when the computer program is executed by the processor.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps of the various method embodiments.
Embodiments of the present application provide a computer program product which, when run on an electronic device, causes the electronic device to perform the steps that may be implemented in the various method embodiments described.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the embodiment, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program may implement the steps of the method embodiments when executed by a processor. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium can include at least any entity or device capable of carrying computer program code to a camera device/electronic apparatus, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the embodiments, the descriptions of the embodiments are focused on, and the parts of a certain embodiment that are not described or depicted in detail can be referred to for related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The foregoing embodiments are merely illustrative of the technical solutions of the present application, and not restrictive, and although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that modifications may still be made to the technical solutions described in the foregoing embodiments or equivalent substitutions of some technical features thereof, and that such modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.