CN117972581A - Abnormal login early warning method, device, equipment and storage medium - Google Patents
Abnormal login early warning method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN117972581A CN117972581A CN202311832453.6A CN202311832453A CN117972581A CN 117972581 A CN117972581 A CN 117972581A CN 202311832453 A CN202311832453 A CN 202311832453A CN 117972581 A CN117972581 A CN 117972581A
- Authority
- CN
- China
- Prior art keywords
- abnormal
- login
- data
- detection model
- abnormal login
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Debugging And Monitoring (AREA)
Abstract
本发明涉及计算机领域,公开了异常登录预警方法、装置、设备及存储介质,该方法用于采用实时接收数据和基于Isolation Forest的异常登录检测模型预测的方式,能够快速响应异常登录行为,及时发出预警。该方法包括:构建基于Isolation Forest的异常登录检测模型;实时接收登录请求数据,并提取所述登录请求数据的特征数据;将所述特征数据输入异常登录检测模型,获取所述异常登录检测模型的输出结果;根据所述输出结果判断登录请求是否存在异常,若存在异常,则调用预警策略进行处理;定期收集异常修正反馈数据,并根据所述异常修正反馈数据优化异常登录检测模型。
The present invention relates to the field of computers, and discloses an abnormal login warning method, device, equipment and storage medium. The method is used to adopt a method of real-time data reception and prediction based on an abnormal login detection model of Isolation Forest, which can quickly respond to abnormal login behavior and issue a warning in time. The method includes: constructing an abnormal login detection model based on Isolation Forest; receiving login request data in real time, and extracting feature data of the login request data; inputting the feature data into the abnormal login detection model, and obtaining the output result of the abnormal login detection model; judging whether there is an abnormality in the login request according to the output result, and if there is an abnormality, calling the warning strategy for processing; regularly collecting abnormal correction feedback data, and optimizing the abnormal login detection model according to the abnormal correction feedback data.
Description
技术领域Technical Field
本发明涉及计算机技术领域,尤其涉及一种异常登录预警方法、装置、设备及存储介质。The present invention relates to the field of computer technology, and in particular to an abnormal login early warning method, device, equipment and storage medium.
背景技术Background technique
随着互联网产业的发展,越来越多的实际业务从线下转移到了线上。例如,固定资产管理平台,考勤处理平台等企业常用业务,均实现了网络化处理。节约资源,减少人力投入的同时也带来的许多的安全风险。With the development of the Internet industry, more and more actual businesses have shifted from offline to online. For example, fixed asset management platforms, attendance processing platforms and other common business operations of enterprises have all been processed online. While saving resources and reducing manpower investment, it also brings many security risks.
传统的异常登录检测,往往基于一定的规则对登录行为进行定义,通过匹配提前指定的规则对异常登录行为进行判定。在一些简单的场景中可以使用类似规则预定义的方法识别异常登录,但随着数据量的增加以及业务场景的拓展,规则的定义与选择也会越来越复杂,对相关异常检测人员的技术要求也越来越高。甚至由于不同场景的登录模式之间存在差异,每更换一个业务场景,开发人员就需要设计新的规则,因此大大增加了开发的难度以及工作量,且开发效率低下。Traditional abnormal login detection often defines login behavior based on certain rules, and judges abnormal login behavior by matching pre-specified rules. In some simple scenarios, similar rule pre-definition methods can be used to identify abnormal logins, but with the increase in data volume and the expansion of business scenarios, the definition and selection of rules will become more and more complicated, and the technical requirements for related abnormal detection personnel will also become higher and higher. Even because there are differences between login modes in different scenarios, developers need to design new rules every time a business scenario is changed, which greatly increases the difficulty and workload of development, and the development efficiency is low.
因此,现有技术还有待改进和发展。Therefore, the existing technology still needs to be improved and developed.
发明内容Summary of the invention
本发明提供了一种异常登录预警方法、装置、设备及存储介质,用于采用实时接收数据和基于Isolation Forest的异常登录检测模型预测的方式,能够快速响应异常登录行为,及时发出预警。The present invention provides an abnormal login warning method, device, equipment and storage medium, which are used to quickly respond to abnormal login behavior and issue warnings in time by adopting a method of real-time data reception and abnormal login detection model prediction based on Isolation Forest.
本发明第一方面提供了一种异常登录预警方法,所述异常登录预警方法包括:构建基于Isolation Forest的异常登录检测模型;实时接收登录请求数据,并提取所述登录请求数据的特征数据;将所述特征数据输入异常登录检测模型,获取所述异常登录检测模型的输出结果;根据所述输出结果判断登录请求是否存在异常,若存在异常,则调用预警策略进行处理;定期收集异常修正反馈数据,并根据所述异常修正反馈数据优化异常登录检测模型The first aspect of the present invention provides an abnormal login early warning method, which includes: constructing an abnormal login detection model based on Isolation Forest; receiving login request data in real time and extracting feature data of the login request data; inputting the feature data into the abnormal login detection model to obtain the output result of the abnormal login detection model; judging whether there is an abnormality in the login request according to the output result, and if there is an abnormality, calling the early warning strategy for processing; regularly collecting abnormal correction feedback data, and optimizing the abnormal login detection model according to the abnormal correction feedback data
可选的,在本发明第一方面的第一种实现方式中,所述构建基于IsolationForest的异常登录检测模型,包括:收集历史账号的登录日志数据;对所述历史账号的登录日志数据进行预处理,生成登录日志数据集;将所述登录日志数据集输入基于IsolationForest模型进行训练,得到异常登录检测模型。Optionally, in a first implementation method of the first aspect of the present invention, the construction of an abnormal login detection model based on IsolationForest includes: collecting login log data of historical accounts; preprocessing the login log data of the historical accounts to generate a login log data set; inputting the login log data set into an IsolationForest-based model for training to obtain an abnormal login detection model.
可选的,在本发明第一方面的第二种实现方式中,所述将所述登录日志数据集输入基于Isolation Forest模型进行训练,得到异常登录检测模型,包括:划分所述登录日志数据集为第一训练集和第二训练集,所述第一训练集含有正常登录样本和异常登录样本,所述第二训练集只含有异常登录样本;将所述第一训练集输入Isolation Forest模型进行训练,得到初始训练模型;将所述第二训练集输入所述初始训练模型进行训练,得到异常登录检测模型。Optionally, in a second implementation of the first aspect of the present invention, the input of the login log data set is trained based on the Isolation Forest model to obtain an abnormal login detection model, including: dividing the login log data set into a first training set and a second training set, the first training set containing normal login samples and abnormal login samples, and the second training set only containing abnormal login samples; inputting the first training set into the Isolation Forest model for training to obtain an initial training model; inputting the second training set into the initial training model for training to obtain an abnormal login detection model.
可选的,在本发明第一方面的第三种实现方式中,所述实时接收登录请求数据,并提取所述登录请求数据的特征数据,包括:使用Apache Spark结合Kafka建立流处理框架以实时接收登录请求数据;对所述登录请求数据进行解析,并划分为多个字段;从所述多个字段中提取登录请求数据的特征数据,所述特征数据包括时间特征、地理位置特征、设备信息特征和登录行为特征。Optionally, in a third implementation method of the first aspect of the present invention, the real-time receiving of login request data and extracting feature data of the login request data include: using Apache Spark in combination with Kafka to establish a stream processing framework to receive login request data in real time; parsing the login request data and dividing it into multiple fields; extracting feature data of the login request data from the multiple fields, the feature data including time features, geographic location features, device information features and login behavior features.
可选的,在本发明第一方面的第四种实现方式中,所述将所述特征数据输入异常登录检测模型,获取所述异常登录检测模型的输出结果,包括:获取每个所述特征数据对应的特征向量;将每个所述特征数据对应的特征向量输入所述异常登录检测模型;获取所述异常登录检测模型输出的与每个特征数据对应的特征向量的平均叶子节点高度。Optionally, in a fourth implementation method of the first aspect of the present invention, inputting the feature data into an abnormal login detection model to obtain an output result of the abnormal login detection model includes: obtaining a feature vector corresponding to each feature data; inputting the feature vector corresponding to each feature data into the abnormal login detection model; and obtaining the average leaf node height of the feature vector corresponding to each feature data output by the abnormal login detection model.
可选的,在本发明第一方面的第五种实现方式中,所述根据所述输出结果判断登录请求是否存在异常,若存在异常,则调用预警策略进行处理,包括:将每个所述特征数据对应的特征向量的平均叶子节点高度与预设异常阈值进行比较,若每个所述特征数据对应的特征向量的平均叶子节点高度大于预设异常阈值,则判定登录请求异常;当判定为登录请求异常时,计算每个所述特征数据对应的特征向量的平均叶子节点高度与预设异常阈值的差值,并根据所述差值划分为不同的异常级别;根据划分的异常级别调用对应级别的预警策略进行处理。Optionally, in a fifth implementation method of the first aspect of the present invention, it is determined whether there is an abnormality in the login request based on the output result. If there is an abnormality, an early warning strategy is called for processing, including: comparing the average leaf node height of the feature vector corresponding to each of the feature data with a preset abnormality threshold. If the average leaf node height of the feature vector corresponding to each of the feature data is greater than the preset abnormality threshold, the login request is determined to be abnormal; when it is determined that the login request is abnormal, the difference between the average leaf node height of the feature vector corresponding to each of the feature data and the preset abnormality threshold is calculated, and the abnormality levels are divided according to the difference; and the early warning strategy of the corresponding level is called for processing according to the divided abnormality level.
可选的,在本发明第一方面的第六种实现方式中,所述定期收集异常修正反馈数据,并根据所述异常修正反馈数据优化异常登录检测模型,包括;定期收集异常修正反馈数据;对所述异常修正反馈数据进行分析,得到分析结果,所述分析结果包括误报率和漏报率;根据所述分析结果对所述异常登录检测模型的参数进行优化,使用经过优化的参数重新训练所述异常登录检测模型。Optionally, in a sixth implementation method of the first aspect of the present invention, the periodic collection of abnormal correction feedback data and the optimization of the abnormal login detection model based on the abnormal correction feedback data include: periodically collecting abnormal correction feedback data; analyzing the abnormal correction feedback data to obtain analysis results, wherein the analysis results include a false alarm rate and a missed alarm rate; optimizing the parameters of the abnormal login detection model based on the analysis results, and retraining the abnormal login detection model using the optimized parameters.
本发明第二方面提供了一种异常登录预警装置,包括:模型构建模块,用于构建基于Isolation Forest的异常登录检测模型;提取模块,用于实时接收登录请求数据,并提取所述登录请求数据的特征数据;异常检测模块,用于将所述特征数据输入异常登录检测模型,获取所述异常登录检测模型的输出结果;预警模块,用于根据所述输出结果判断登录请求是否存在异常,若存在异常,则调用预警策略进行处理;模型优化模块,用于定期收集异常修正反馈数据,并根据所述异常修正反馈数据优化异常登录检测模型。The second aspect of the present invention provides an abnormal login warning device, including: a model building module, used to build an abnormal login detection model based on Isolation Forest; an extraction module, used to receive login request data in real time and extract feature data of the login request data; an abnormality detection module, used to input the feature data into the abnormal login detection model and obtain the output result of the abnormal login detection model; an early warning module, used to determine whether there is an abnormality in the login request based on the output result, and if there is an abnormality, call the early warning strategy for processing; a model optimization module, used to regularly collect abnormal correction feedback data, and optimize the abnormal login detection model based on the abnormal correction feedback data.
可选的,在本发明第二方面的第一种实现方式中,所述模型构建模块包括:第一收集单元,用于收集历史账号的登录日志数据;预处理单元,用于对所述历史账号的登录日志数据进行预处理,生成登录日志数据集;训练单元,用于将所述登录日志数据集输入基于Isolation Forest模型进行训练,得到异常登录检测模型。Optionally, in a first implementation method of the second aspect of the present invention, the model building module includes: a first collection unit, used to collect login log data of historical accounts; a preprocessing unit, used to preprocess the login log data of the historical accounts to generate a login log data set; and a training unit, used to input the login log data set into an Isolation Forest model for training to obtain an abnormal login detection model.
可选的,在本发明第二方面的第二种实现方式中,所述提取模块包括:实时接收单元,用于使用Apache Spark结合Kafka建立流处理框架以实时接收登录请求数据;解析单元,用于对所述登录请求数据进行解析,并划分为多个字段;提取单元,用于从所述多个字段中提取登录请求数据的特征数据,所述特征数据包括时间特征、地理位置特征、设备信息特征和登录行为特征。Optionally, in a second implementation of the second aspect of the present invention, the extraction module includes: a real-time receiving unit, used to use Apache Spark in combination with Kafka to establish a stream processing framework to receive login request data in real time; a parsing unit, used to parse the login request data and divide it into multiple fields; an extraction unit, used to extract feature data of the login request data from the multiple fields, the feature data including time features, geographic location features, device information features and login behavior features.
可选的,在本发明第二方面的第三种实现方式中,所述异常检测模块包括:第一获取单元,用于获取每个所述特征数据对应的特征向量;异常检测单元,用于将每个所述特征数据对应的特征向量输入所述异常登录检测模型;第二获取单元,用于获取所述异常登录检测模型输出的与每个特征数据对应的特征向量的平均叶子节点高度。Optionally, in a third implementation of the second aspect of the present invention, the anomaly detection module includes: a first acquisition unit, used to acquire a feature vector corresponding to each of the feature data; an anomaly detection unit, used to input the feature vector corresponding to each of the feature data into the abnormal login detection model; and a second acquisition unit, used to acquire the average leaf node height of the feature vector corresponding to each feature data output by the abnormal login detection model.
可选的,在本发明第二方面的第四种实现方式中,所述预警模块包括:比较单元,用于将每个所述特征数据对应的特征向量的平均叶子节点高度与预设异常阈值进行比较,若每个所述特征数据对应的特征向量的平均叶子节点高度大于预设异常阈值,则判定登录请求异常;划分单元,用于当判定为登录请求异常时,计算每个所述特征数据对应的特征向量的平均叶子节点高度与预设异常阈值的差值,并根据所述差值划分为不同的异常级别;预警单元,用于根据划分的异常级别调用对应级别的预警策略进行处理。Optionally, in a fourth implementation method of the second aspect of the present invention, the early warning module includes: a comparison unit, used to compare the average leaf node height of the feature vector corresponding to each of the feature data with a preset abnormal threshold, if the average leaf node height of the feature vector corresponding to each of the feature data is greater than the preset abnormal threshold, then the login request is determined to be abnormal; a division unit, used to calculate the difference between the average leaf node height of the feature vector corresponding to each of the feature data and the preset abnormal threshold when the login request is determined to be abnormal, and divide it into different abnormal levels according to the difference; an early warning unit, used to call the early warning strategy of the corresponding level for processing according to the divided abnormal level.
可选的,在本发明第二方面的第五种实现方式中,所述模型优化模块包括:第一收集单元,用于定期收集异常修正反馈数据;分析单元,用于对所述异常修正反馈数据进行分析,得到分析结果,所述分析结果包括误报率和漏报率;优化单元,用于根据所述分析结果对所述异常登录检测模型的参数进行优化,使用经过优化的参数重新训练所述异常登录检测模型。Optionally, in a fifth implementation of the second aspect of the present invention, the model optimization module includes: a first collection unit, used to periodically collect abnormal correction feedback data; an analysis unit, used to analyze the abnormal correction feedback data to obtain analysis results, wherein the analysis results include false alarm rate and missed alarm rate; an optimization unit, used to optimize the parameters of the abnormal login detection model according to the analysis results, and retrain the abnormal login detection model using the optimized parameters.
本发明第三方面提供了一种异常登录预警设备,包括:存储器和至少一个处理器,所述存储器中存储有计算机可读指令,所述存储器和所述至少一个处理器通过线路互连;所述至少一个处理器调用所述存储器中的所述计算机可读指令,以使得所述异常登录预警设备执行如上所述异常登录预警方法的各个步骤。The third aspect of the present invention provides an abnormal login warning device, comprising: a memory and at least one processor, the memory storing computer-readable instructions, the memory and the at least one processor being interconnected through lines; the at least one processor calling the computer-readable instructions in the memory so that the abnormal login warning device executes each step of the abnormal login warning method as described above.
本发明的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机可读指令,当其在计算机上运行时,使得计算机执行如上所述异常登录预警方法的各个步骤。A fourth aspect of the present invention provides a computer-readable storage medium, in which computer-readable instructions are stored. When the computer-readable storage medium is run on a computer, the computer executes each step of the abnormal login warning method described above.
本发明提供的技术方案中,采用实时接收数据和异常登录检测模型预测的方式,能够快速响应异常登录行为,及时发出预警;而且异常登录检测模型基于IsolationForest算法来识别异常点,能够能有效处理大规模数据和提高检测准确度,此外,其通过收集异常修正反馈数据不断优化模型,使其适应新的登录行为模式,能够提高适应性和准确性。In the technical solution provided by the present invention, the method of real-time data reception and abnormal login detection model prediction is adopted, which can quickly respond to abnormal login behavior and issue early warnings in time; and the abnormal login detection model is based on the IsolationForest algorithm to identify abnormal points, which can effectively process large-scale data and improve detection accuracy. In addition, it continuously optimizes the model by collecting abnormal correction feedback data to adapt it to new login behavior patterns, which can improve adaptability and accuracy.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明实施例提供的异常登录预警方法的第一种流程图;FIG1 is a flow chart of a first abnormal login warning method provided by an embodiment of the present invention;
图2为本发明实施例提供的异常登录预警方法的第二种流程图;FIG2 is a second flow chart of the abnormal login warning method provided by an embodiment of the present invention;
图3为本发明实施例提供的异常登录预警方法的第三种流程图;FIG3 is a third flow chart of the abnormal login warning method provided by an embodiment of the present invention;
图4为本发明实施例提供的异常登录预警方法的第四种流程图;FIG4 is a fourth flow chart of the abnormal login warning method provided by an embodiment of the present invention;
图5为本发明实施例提供的异常登录预警方法的第五种流程图;FIG5 is a fifth flow chart of the abnormal login warning method provided by an embodiment of the present invention;
图6为本发明实施例提供的异常登录预警方法的第六种流程图;FIG6 is a sixth flow chart of the abnormal login warning method provided by an embodiment of the present invention;
图7为本发明实施例提供的异常登录预警装置的结构示意图;7 is a schematic diagram of the structure of an abnormal login warning device provided by an embodiment of the present invention;
图8为本发明实施例提供的异常登录预警设备的结构示意图。FIG8 is a schematic diagram of the structure of an abnormal login warning device provided in an embodiment of the present invention.
具体实施方式Detailed ways
本发明实施例提供了异常登录预警方法、装置、设备及存储介质,该方法用于采用实时接收数据和基于Isolation Forest的异常登录检测模型预测的方式,能够快速响应异常登录行为,及时发出预警。该方法包括:构建基于Isolation Forest的异常登录检测模型;实时接收登录请求数据,并提取所述登录请求数据的特征数据;将所述特征数据输入异常登录检测模型,获取所述异常登录检测模型的输出结果;根据所述输出结果判断登录请求是否存在异常,若存在异常,则调用预警策略进行处理;定期收集异常修正反馈数据,并根据所述异常修正反馈数据优化异常登录检测模型。The embodiment of the present invention provides an abnormal login warning method, device, equipment and storage medium. The method is used to adopt the method of real-time data reception and Isolation Forest-based abnormal login detection model prediction, which can quickly respond to abnormal login behavior and issue warnings in time. The method includes: constructing an abnormal login detection model based on Isolation Forest; receiving login request data in real time and extracting feature data of the login request data; inputting the feature data into the abnormal login detection model to obtain the output result of the abnormal login detection model; judging whether there is an abnormality in the login request according to the output result, and if there is an abnormality, calling the warning strategy for processing; regularly collecting abnormal correction feedback data, and optimizing the abnormal login detection model according to the abnormal correction feedback data.
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the specification and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchanged where appropriate, so that the embodiments described herein can be implemented in an order other than that illustrated or described herein. In addition, the terms "including" or "having" and any variations thereof are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units that are clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.
为便于理解,下面对本发明实施例的具体流程进行描述,请参阅图1,本发明实施例中一种异常登录预警方法的第一个实施例包括:For ease of understanding, the specific process of the embodiment of the present invention is described below. Please refer to FIG1. The first embodiment of an abnormal login warning method in the embodiment of the present invention includes:
S101、构建基于Isolation Forest的异常登录检测模型。S101. Build an abnormal login detection model based on Isolation Forest.
可以理解地,本发明的执行主体可以为异常登录预警装置,还可以是终端或者服务器,具体此处不做限定。本发明实施例以服务器为执行主体为例进行说明。It is understandable that the execution subject of the present invention may be an abnormal login warning device, or a terminal or a server, which is not limited here. The embodiment of the present invention is described by taking a server as the execution subject as an example.
可以理解地,Isolation Forest(隔离森林)是一种基于树的异常检测算法,适用于高维数据集中的异常检测任务。其核心思想是通过构建一棵随机切分的二叉树来实现异常值的快速检测。It can be understood that Isolation Forest is a tree-based anomaly detection algorithm suitable for anomaly detection tasks in high-dimensional data sets. Its core idea is to achieve fast detection of outliers by constructing a randomly split binary tree.
Isolation Forest通过随机选择特征和切分值来构建孤立树,这种随机划分的方式有助于快速定位异常值,在基于Isolation Forest构建异常登录检测模型时,可以利用其快速定位异常值的特性,识别登录行为中的异常情况。Isolation Forest constructs an isolation tree by randomly selecting features and split values. This random partitioning method helps to quickly locate outliers. When building an abnormal login detection model based on Isolation Forest, its ability to quickly locate outliers can be used to identify abnormalities in login behavior.
在本实施例中,在构建基于Isolation Forest的异常登录检测模型时,需要收集训练样本,提取有意义的特征,例如登录时间、地理位置、设备信息等,并将其转换为模型可用的格式,再对Isolation Forest模型进行训练。In this embodiment, when building an abnormal login detection model based on Isolation Forest, it is necessary to collect training samples, extract meaningful features such as login time, geographic location, device information, etc., and convert them into a format that can be used by the model, and then train the Isolation Forest model.
在训练过程中,可以采用交叉验证的方法评估Isolation Forest模型。During the training process, the cross-validation method can be used to evaluate the Isolation Forest model.
S102、实时接收登录请求数据,并提取登录请求数据的特征数据。S102: Receive login request data in real time and extract feature data of the login request data.
在本实施例中,需要实现预设一个数据接收端点,可以是一个API端点或消息队列,用于接收实时的登录请求数据。In this embodiment, it is necessary to preset a data receiving endpoint, which may be an API endpoint or a message queue, for receiving real-time login request data.
特征数据包括登录时间、IP地址、设备信息、登录结果等数据。Feature data includes login time, IP address, device information, login results and other data.
在本实施例中,可以使用流处理技术(如Apache Kafka Streams、Apache Flink等)进行实时特征处理和转换,在接收到登录请求数据后能够快速提取并处理特征。In this embodiment, stream processing technology (such as Apache Kafka Streams, Apache Flink, etc.) can be used for real-time feature processing and conversion, and features can be quickly extracted and processed after receiving the login request data.
在本实施例中,可以采用异步处理机制来提高系统的并发性能和响应速度,确保即使在高负载情况下也能够及时处理大量的实时登录请求数据。In this embodiment, an asynchronous processing mechanism may be used to improve the concurrent performance and response speed of the system, ensuring that a large amount of real-time login request data can be processed in a timely manner even under high load conditions.
S103、将特征数据输入异常登录检测模型,获取异常登录检测模型的输出结果。S103: Input the feature data into an abnormal login detection model to obtain an output result of the abnormal login detection model.
在本实施例中,将特征数据输入异常登录检测模型前,需要确保从流处理框架中获取的特征数据格式与异常登录检测模型所需的输入格式一致,并进行必要的数据预处理和转换(如标准化、归一化等)。In this embodiment, before inputting the feature data into the abnormal login detection model, it is necessary to ensure that the feature data format obtained from the stream processing framework is consistent with the input format required by the abnormal login detection model, and perform necessary data preprocessing and conversion (such as standardization, normalization, etc.).
异常登录检测模型对特征数据进行处理后,会得到相应的输出结果,通常包括每个样本的异常分数或异常概率。输出结果可以用来判断登录请求是否为异常登录。After the abnormal login detection model processes the feature data, it will get the corresponding output results, which usually include the abnormal score or abnormal probability of each sample. The output results can be used to determine whether the login request is an abnormal login.
S104、根据输出结果判断登录请求是否存在异常,若存在异常,则调用预警策略进行处理。S104. Determine whether there is an abnormality in the login request based on the output result. If there is an abnormality, call the early warning strategy for processing.
在本实施例中,针对登录请求异常的情况可以设置多级预警规则,例如包括低危预警、中危预警和高危预警。低危预警即表示登录请求可能存在异常,发出异常提醒,包括构建预警信息的内容、确定预警级别、指定接收人员等。中危预警表示登录请求强烈可疑,需要人工审查。高危预警表示登录请求高度可疑,直接锁定账号。In this embodiment, multi-level warning rules can be set for abnormal login requests, for example, including low-risk warning, medium-risk warning and high-risk warning. Low-risk warning means that the login request may be abnormal, and an abnormal reminder is issued, including constructing the content of the warning information, determining the warning level, and specifying the receiving personnel. Medium-risk warning means that the login request is highly suspicious and requires manual review. High-risk warning means that the login request is highly suspicious and the account is directly locked.
如果判定登录请求不存在异常时,则无需进行预警处理。If it is determined that there is no abnormality in the login request, no warning processing is required.
S105、定期收集异常修正反馈数据,并根据异常修正反馈数据优化异常登录检测模型。S105. Regularly collect abnormal correction feedback data, and optimize the abnormal login detection model based on the abnormal correction feedback data.
在本实施例中,可以建立一个反馈系统,用于接收和记录异常修正反馈数据。异常修正反馈数据为用户或管理员提供的异常修正信息,比如确认某次被标记为异常的登录实际上是正常的情况。In this embodiment, a feedback system may be established to receive and record abnormality correction feedback data. The abnormality correction feedback data is abnormality correction information provided by a user or administrator, such as confirming that a login marked as abnormal is actually normal.
在本实施例中,服务器定期收集反馈系统的信息,例如,每个月或每个季度或每半年,具体时间间隔根据实际需求设置。In this embodiment, the server collects information of the feedback system regularly, for example, every month, every quarter, or every six months. The specific time interval is set according to actual needs.
在本实施例中,结合收集到的异常修正反馈数据,对异常登录检测模型进行重新训练或微调。可以通过增加新的训练样本或者调整模型参数来优化模型性能,以减少误报率和漏报率。In this embodiment, the abnormal login detection model is retrained or fine-tuned in combination with the collected abnormal correction feedback data. The model performance can be optimized by adding new training samples or adjusting model parameters to reduce the false positive rate and the false negative rate.
本实施例提供的是一种异常登录预警方法,其采用实时接收数据和异常登录检测模型预测的方式,能够快速响应异常登录行为,及时发出预警;而且异常登录检测模型基于Isolation Forest算法来识别异常点,能够能有效处理大规模数据和提高检测准确度,此外,其通过收集异常修正反馈数据不断优化模型,使其适应新的登录行为模式,能够提高适应性和准确性。This embodiment provides an abnormal login warning method, which adopts the method of real-time data reception and abnormal login detection model prediction, and can quickly respond to abnormal login behavior and issue warnings in time; and the abnormal login detection model is based on the Isolation Forest algorithm to identify abnormal points, which can effectively process large-scale data and improve detection accuracy. In addition, it continuously optimizes the model by collecting abnormal correction feedback data to adapt it to new login behavior patterns, which can improve adaptability and accuracy.
请参阅图2,本发明实施例中异常登录预警方法的第二个实施例包括:Please refer to FIG. 2 , a second embodiment of the abnormal login warning method according to the embodiment of the present invention includes:
S201、收集历史账号的登录日志数据。S201. Collect login log data of historical accounts.
在本实施例中,可以收集登录记录、异常情况记录和安全事件记录。In this embodiment, login records, abnormal situation records and security event records can be collected.
在本实施例中,登录记录包括时间戳、用户名、登录IP地址、登录设备信息等数据。In this embodiment, the login record includes data such as timestamp, user name, login IP address, login device information, etc.
异常情况记录包括登录失败次数超过阈值、异地登录、频繁登录等异常行为的记录。Abnormal situation records include records of abnormal behaviors such as the number of failed logins exceeding the threshold, remote logins, and frequent logins.
在本实施例中,收集的历史账号的登录日志数据可以保存在样本数据库中,作为建模和训练异常检测模型的数据源,帮助改进模型性能和准确性。In this embodiment, the collected login log data of historical accounts can be stored in a sample database as a data source for modeling and training anomaly detection models, helping to improve model performance and accuracy.
S202、对历史账号的登录日志数据进行预处理,生成登录日志数据集。S202: Preprocess the login log data of historical accounts to generate a login log data set.
在本实施例中,对历史账号的登录日志数据进行预处理包括以下方面:In this embodiment, preprocessing the login log data of historical accounts includes the following aspects:
数据清洗:识别并处理缺失值、重复值、异常值等,确保数据质量。Data cleaning: Identify and process missing values, duplicate values, outliers, etc. to ensure data quality.
特征提取:从原始日志数据中提取有用的特征,比如登录时间、IP地址、设备类型、登录结果(成功/失败)等。Feature extraction: Extract useful features from raw log data, such as login time, IP address, device type, login result (success/failure), etc.
特征转换:对某些特征进行格式转换或编码,比如将时间戳转换为日期时间格式,对分类特征进行独热编码等。Feature conversion: convert or encode certain features, such as converting timestamps to date and time formats, and performing one-hot encoding on categorical features.
标记异常情况:根据已知的异常登录情况,将相应的样本标记为异常类别,以便后续模型训练。Mark anomalies: Based on known abnormal login situations, mark the corresponding samples as abnormal categories for subsequent model training.
S203、将登录日志数据集输入基于Isolation Forest模型进行训练,得到异常登录检测模型。S203: Input the login log data set into the Isolation Forest model for training to obtain an abnormal login detection model.
在本实施例中,将登录日志数据集输入基于Isolation Forest模型进行训练,得到异常登录检测模型,包括:划分登录日志数据集为第一训练集和第二训练集,第一训练集含有正常登录样本和异常登录样本,第二训练集只含有异常登录样本;将第一训练集输入基于Isolation Forest模型进行训练,得到初始训练模型;将第二训练集输入初始训练模型进行训练,得到异常登录检测模型。In this embodiment, the login log data set is input into the Isolation Forest model for training to obtain an abnormal login detection model, including: dividing the login log data set into a first training set and a second training set, the first training set contains normal login samples and abnormal login samples, and the second training set only contains abnormal login samples; the first training set is input into the Isolation Forest model for training to obtain an initial training model; the second training set is input into the initial training model for training to obtain an abnormal login detection model.
在训练过程中,可以采用孤立树的数量、子采样方法、切分策略等方法对Isolation Forest模型进行参数调优,从而可以进一步提升模型性能。During the training process, the number of isolated trees, subsampling methods, segmentation strategies and other methods can be used to tune the parameters of the Isolation Forest model to further improve the model performance.
本实施例中,通过划分训练集并分阶段训练模型,可以更好地发现异常模式,提高了异常检测的准确性。而且,异常登录检测模型可以通过先验知识对正常登录样本进行学习,提高了对异常样本的识别能力。In this embodiment, by dividing the training set and training the model in stages, abnormal patterns can be better discovered, and the accuracy of abnormal detection can be improved. In addition, the abnormal login detection model can learn normal login samples through prior knowledge, thereby improving the ability to identify abnormal samples.
请参阅图3,本发明实施例中的一种异常登录预警方法的第三个实施例包括:Please refer to FIG. 3 , a third embodiment of an abnormal login warning method in an embodiment of the present invention includes:
S301、使用Apache Spark结合Kafka建立流处理框架以实时接收登录请求数据。S301. Use Apache Spark in combination with Kafka to establish a stream processing framework to receive login request data in real time.
在本实施例中,预先安装和配置Apache Spark和Kafka;接着在Kafka中创建一个主题,用于接收登录请求数据;再创建一个Spark Streaming作业,该作业将从Kafka主题中消费数据流;最后将Spark Streaming作业提交到集群上运行,以便实时接收和处理来自Kafka主题的登录请求数据。In this embodiment, Apache Spark and Kafka are pre-installed and configured; then a topic is created in Kafka to receive login request data; then a Spark Streaming job is created, which consumes the data stream from the Kafka topic; finally, the Spark Streaming job is submitted to the cluster for running, so as to receive and process the login request data from the Kafka topic in real time.
可以理解地,结合Kafka和Apache Spark可以构建高吞吐量和低延迟的实时数据处理流程,能够快速响应并处理大量的登录请求数据。且Kafka作为消息队列系统,能够支持多样化的数据源接入,包括应用日志、传感器数据等,使得整个系统更加灵活和通用。It is understandable that combining Kafka and Apache Spark can build a high-throughput and low-latency real-time data processing process that can quickly respond to and process a large amount of login request data. And as a message queue system, Kafka can support access to a variety of data sources, including application logs, sensor data, etc., making the entire system more flexible and versatile.
S302、对登录请求数据进行解析,并划分为多个字段。S302: Parse the login request data and divide it into multiple fields.
在本实施例中,解析过程中涉及数据格式的解析(如JSON、XML、CSV等),以及特定标记或分隔符的识别。In this embodiment, the parsing process involves parsing of data formats (such as JSON, XML, CSV, etc.) and identification of specific tags or delimiters.
将解析后的数据划分为多个字段,典型的字段可能包括用户名、登录时间、IP地址、设备信息和登录结果。用户名指的是登录请求所涉及的用户账号。登录时间指的是发起登录请求的时间戳或日期时间信息。IP地址指的是发起登录请求的IP地址。设备信息如操作系统、浏览器类型等。登录结果指的是登录请求的结果,成功或失败等。The parsed data is divided into multiple fields. Typical fields may include user name, login time, IP address, device information, and login result. User name refers to the user account involved in the login request. Login time refers to the timestamp or date and time information of the login request. IP address refers to the IP address that initiated the login request. Device information includes operating system, browser type, etc. Login result refers to the result of the login request, success or failure, etc.
S303、从多个字段中提取登录请求数据的特征数据,特征数据包括时间特征、地理位置特征、设备信息特征和登录行为特征。S303. Extract feature data of the login request data from multiple fields, where the feature data includes time features, geographic location features, device information features, and login behavior features.
在本实施例中,时间特征是从登录时间字段中提取时间相关的特征,包括小时、分钟、星期几、是否工作日等,这些特征可以反映出登录行为的时间模式和周期性In this embodiment, the time feature is to extract time-related features from the login time field, including hours, minutes, day of the week, whether it is a working day, etc. These features can reflect the time pattern and periodicity of the login behavior.
地理位置特征是从IP地址字段中提取地理位置信息,可以使用IP地址库来映射IP地址到具体的地理位置,例如国家、城市等,这些信息可以揭示出登录行为的地理分布情况。The geographic location feature extracts geographic location information from the IP address field. The IP address library can be used to map the IP address to a specific geographic location, such as country, city, etc. This information can reveal the geographical distribution of login behavior.
设备信息特征是从设备信息字段中提取关于设备的特征,如操作系统类型、浏览器类型、设备品牌等,这些信息有助于识别不同设备类型的登录行为。Device information features are features about the device extracted from the device information field, such as operating system type, browser type, device brand, etc. This information helps to identify login behaviors of different device types.
登录行为特征是登录结果字段中提取登录行为特征,比如成功登录次数、失败登录次数、账号被锁定次数等,这些特征可以反映出账号的登录行为特点。Login behavior features are login behavior features extracted from the login result field, such as the number of successful logins, the number of failed logins, the number of account lockouts, etc. These features can reflect the login behavior characteristics of the account.
本实施例中,在提取登录请求数据的特征数据时,可以采用关键字匹配算法提取,也可以采用正则表达式提取。In this embodiment, when extracting the characteristic data of the login request data, a keyword matching algorithm may be used for extraction, or a regular expression may be used for extraction.
本实施例中,结合Apache Spark和Kafka建立流处理框架以实时接收登录请求数据,并进行特征提取,能够有效地提高实时数据处理效率和灵活性,为异常登录检测系统提供可靠的数据基础和特征支持。In this embodiment, Apache Spark and Kafka are combined to establish a stream processing framework to receive login request data in real time and perform feature extraction, which can effectively improve the efficiency and flexibility of real-time data processing and provide a reliable data foundation and feature support for the abnormal login detection system.
请参阅图4,本发明实施例中的一种异常登录预警方法的第四个实施例包括:Please refer to FIG. 4 , a fourth embodiment of an abnormal login warning method according to an embodiment of the present invention includes:
S401、获取每个特征数据对应的特征向量。S401, obtaining a feature vector corresponding to each feature data.
在本实施例中,将不同类型的特征数据(如类别型、数值型)进行合适的编码转换,比如独热编码、标签编码、数值归一化等,从而得到每个特征数据对应的特征向量。并将得到的特征向量转化为适当的数据结构,比如DataFrame或Numpy数组等,以便于后续的异常登录检测模型输入。In this embodiment, different types of feature data (such as categorical type and numerical type) are subjected to appropriate encoding conversion, such as one-hot encoding, label encoding, numerical normalization, etc., so as to obtain a feature vector corresponding to each feature data. The obtained feature vector is converted into an appropriate data structure, such as DataFrame or Numpy array, etc., so as to facilitate the subsequent input of the abnormal login detection model.
S402、将每个特征数据对应的特征向量输入异常登录检测模型。S402: Input the feature vector corresponding to each feature data into the abnormal login detection model.
S403、获取异常登录检测模型输出的与每个特征数据对应的特征向量的平均叶子节点高度。S403: Obtain the average leaf node height of the feature vector corresponding to each feature data output by the abnormal login detection model.
在本实施例中,加载异常登录检测模型,逐个将每个特征数据对应的特征向量输入已加载的异常登录检测模型,异常登录检测模型对输入的特征向量进行推断和计算,输出每个特征数据对应的特征向量的平均叶子节点高度,用于表示该特征数据的异常程度。In this embodiment, the abnormal login detection model is loaded, and the feature vectors corresponding to each feature data are input into the loaded abnormal login detection model one by one. The abnormal login detection model infers and calculates the input feature vectors, and outputs the average leaf node height of the feature vector corresponding to each feature data, which is used to indicate the degree of abnormality of the feature data.
本实施例中,将每个特征数据对应的特征向量输入异常登录检测模型,使用特征向量能够实现不同特征数据的统一化处理,有助于消除不同特征数据之间的尺度差异,使得异常检测模型更容易捕捉到异常模式,获取异常检测模型输出的与每个特征数据对应的特征向量的平均叶子节点高度可以更直观地了解异常检测模型对每个特征数据的判断依据,增加了模型输出的可解释性。In this embodiment, the feature vector corresponding to each feature data is input into the abnormal login detection model. The use of feature vectors can achieve unified processing of different feature data, which helps to eliminate the scale differences between different feature data, making it easier for the anomaly detection model to capture abnormal patterns. Obtaining the average leaf node height of the feature vector corresponding to each feature data output by the anomaly detection model can more intuitively understand the judgment basis of the anomaly detection model for each feature data, thereby increasing the interpretability of the model output.
请参阅图5,本发明实施例中的一种异常登录预警方法的第五个实施例包括:Please refer to FIG5 , a fifth embodiment of an abnormal login warning method in an embodiment of the present invention includes:
S501、将每个特征数据对应的特征向量的平均叶子节点高度与预设异常阈值进行比较,若每个特征数据对应的特征向量的平均叶子节点高度大于预设异常阈值,则判定登录请求异常。S501. Compare the average leaf node height of the feature vector corresponding to each feature data with the preset abnormal threshold. If the average leaf node height of the feature vector corresponding to each feature data is greater than the preset abnormal threshold, determine that the login request is abnormal.
S502、当判定为登录请求异常时,计算每个特征数据对应的特征向量的平均叶子节点高度与预设异常阈值的差值,并根据差值划分为不同的异常级别。S502: When it is determined that the login request is abnormal, the difference between the average leaf node height of the feature vector corresponding to each feature data and the preset abnormal threshold is calculated, and different abnormal levels are divided according to the difference.
本实施例中,通过比较每个特征数据对应的特征向量的平均叶子节点高度与预设异常阈值,系统可以对登录请求进行细粒度的异常判断,不仅能简单地判断是否存在异常,还能根据差值划分为不同的异常级别。In this embodiment, by comparing the average leaf node height of the feature vector corresponding to each feature data with the preset abnormality threshold, the system can perform fine-grained abnormality judgment on the login request. It can not only simply determine whether there is an abnormality, but also divide it into different abnormality levels according to the difference.
本实施例中,异常级别包括一级异常、二级异常和三级异常,一级异常表示登录请求可能存在异常;二级异常表示登录请求强烈可疑,三级异常表示登录请求高度可疑,三级异常的异常程度最高,二级异常的异常程度中等,一级异常的异常程度最低。In this embodiment, the abnormality levels include level one abnormality, level two abnormality and level three abnormality. Level one abnormality indicates that the login request may be abnormal; level two abnormality indicates that the login request is strongly suspicious; level three abnormality indicates that the login request is highly suspicious. Level three abnormality has the highest degree of abnormality, level two abnormality has a medium degree of abnormality, and level one abnormality has the lowest degree of abnormality.
S503、根据划分的异常级别调用对应级别的预警策略进行处理。S503: Call the warning strategy of the corresponding level for processing according to the divided abnormal level.
在本实施例中,当异常级别为一级异常时,对应的预警策略为低危预警策略,当异常级别为二级异常时,对应的预警策略为中危预警策略,当异常级别为三级异常时,对应的预警策略为高危预警策略,低危预警策略包括发送提示性通知给用户,加强身份验证等,中危预警策略包括触发多因素认证、要求用户进行风险确认等,高危预警策略包括立即锁定账户、发出紧急警报并启动安全流程等。In this embodiment, when the abnormality level is level one, the corresponding warning strategy is a low-risk warning strategy, when the abnormality level is level two, the corresponding warning strategy is a medium-risk warning strategy, and when the abnormality level is level three, the corresponding warning strategy is a high-risk warning strategy. The low-risk warning strategy includes sending prompt notifications to users, strengthening identity authentication, etc. The medium-risk warning strategy includes triggering multi-factor authentication, requiring users to confirm risks, etc. The high-risk warning strategy includes immediately locking the account, issuing an emergency alarm and initiating a security process, etc.
在本实施例中,当每个特征数据对应的特征向量的平均叶子节点高度不大于预设异常阈值,则判定登录请求正常,无需进行预警处理。In this embodiment, when the average leaf node height of the feature vector corresponding to each feature data is not greater than the preset abnormal threshold, it is determined that the login request is normal and no early warning processing is required.
本实施例中,当判定为登录请求异常时,计算每个特征数据对应的特征向量的平均叶子节点高度与预设异常阈值的差值,并通过根据差值划分为不同的异常级别,系统可以针对不同级别的异常调用对应级别的预警策略,从而实现多级预警处理,更有针对性地应对不同严重程度的异常情况。In this embodiment, when a login request is determined to be abnormal, the difference between the average leaf node height of the feature vector corresponding to each feature data and the preset abnormal threshold is calculated, and the system is divided into different abnormal levels according to the difference. The system can call the corresponding level of warning strategy for different levels of abnormalities, thereby realizing multi-level warning processing and responding to abnormal situations of different severity in a more targeted manner.
请参阅图6,本发明实施例中的一种异常登录预警方法的第六个实施例包括:Please refer to FIG. 6 , a sixth embodiment of an abnormal login warning method in an embodiment of the present invention includes:
S601、定期收集异常修正反馈数据。S601. Regularly collect abnormal correction feedback data.
在本实施例中,可以预先建立反馈系统,允许用户或管理员提交关于异常登录检测情况的反馈信息。可以包括网站上的反馈表单、应用程序内的反馈功能或专门的电子邮件地址。In this embodiment, a feedback system may be pre-established to allow users or administrators to submit feedback information about abnormal login detection situations, which may include a feedback form on a website, a feedback function within an application, or a dedicated email address.
本实施例中,按照预设周期从反馈系统查询异常登录检测情况的反馈信息,并对异常登录检测情况的反馈信息进行整理,得到异常修正反馈数据。In this embodiment, feedback information on abnormal login detection situations is queried from the feedback system according to a preset period, and the feedback information on abnormal login detection situations is sorted to obtain abnormal correction feedback data.
S602、对异常修正反馈数据进行分析,得到分析结果,分析结果包括误报率和漏报率。S602: Analyze the abnormal correction feedback data to obtain analysis results, which include false alarm rate and missed alarm rate.
在本实施例中,通过分析系统标记为异常但异常修正反馈数据标记为正常登录的情况,计算系统产生的误报率。通过分析系统未标记为异常但异常修正反馈数据标记为异常登录的情况,计算系统产生的漏报率。In this embodiment, the false alarm rate generated by the system is calculated by analyzing the situation where the system is marked as abnormal but the abnormal correction feedback data is marked as normal login. The missed alarm rate generated by the system is calculated by analyzing the situation where the system is not marked as abnormal but the abnormal correction feedback data is marked as abnormal login.
在本实施例中,还可以将异常修正反馈数据按照预警级别分组,并分析每个预警级别下的真实异常情况,以确定是否存在预警级别划分不合理的情况,并调整预警级别的划分。In this embodiment, the abnormal correction feedback data may also be grouped according to the warning level, and the actual abnormal situation at each warning level may be analyzed to determine whether there is an unreasonable division of the warning levels, and adjust the division of the warning levels.
S603、根据分析结果对异常登录检测模型的参数进行优化,使用经过优化的参数重新训练异常登录检测模型。S603. Optimize the parameters of the abnormal login detection model according to the analysis results, and retrain the abnormal login detection model using the optimized parameters.
在本实施例中,根据分析结果对异常登录检测模型的参数进行优化时,可以根据分析结果对异常登录检测模型中各个特征的权重进行调整。In this embodiment, when optimizing the parameters of the abnormal login detection model according to the analysis results, the weights of various features in the abnormal login detection model may be adjusted according to the analysis results.
在本实施例中,还可以在优化后的异常登录检测模型上进行验证和评估,确保模型在修正后有明显的性能提升。具体可以使用交叉验证、AUC指标等方法来评估模型性能。In this embodiment, verification and evaluation can also be performed on the optimized abnormal login detection model to ensure that the model has obvious performance improvement after modification. Specifically, methods such as cross-validation and AUC indicators can be used to evaluate model performance.
本实施例中,通过分析误报率、漏报率和不同预警级别下的真实异常情况,可以有针对性地调整异常登录检测模型的参数,提高其准确性和鲁棒性,而且可以根据异常修正反馈数据的分析结果及时对异常登录检测模型进行优化和重新训练,使系统能够快速适应新的安全威胁和攻击手法。In this embodiment, by analyzing the false alarm rate, missed alarm rate and actual abnormal situations under different warning levels, the parameters of the abnormal login detection model can be adjusted in a targeted manner to improve its accuracy and robustness. Moreover, the abnormal login detection model can be optimized and retrained in time according to the analysis results of the abnormal correction feedback data, so that the system can quickly adapt to new security threats and attack methods.
上面对本发明实施例中异常登录预警方法进行了描述,下面对本发明实施例中装置进行描述,请参阅图7,本发明实施例中异常登录预警装置的实施方式包括:The above describes the abnormal login warning method in the embodiment of the present invention. The following describes the device in the embodiment of the present invention. Please refer to Figure 7. The implementation method of the abnormal login warning device in the embodiment of the present invention includes:
模型构建模块701,用于构建基于Isolation Forest的异常登录检测模型;A model building module 701 is used to build an abnormal login detection model based on Isolation Forest;
提取模块702,用于实时接收登录请求数据,并提取所述登录请求数据的特征数据;Extraction module 702, used to receive login request data in real time and extract feature data of the login request data;
异常检测模块703,用于将所述特征数据输入异常登录检测模型,获取所述异常登录检测模型的输出结果;Anomaly detection module 703, used to input the feature data into an abnormal login detection model to obtain an output result of the abnormal login detection model;
预警模块704,用于根据所述输出结果判断登录请求是否存在异常,若存在异常,则调用预警策略进行处理;The early warning module 704 is used to determine whether there is an abnormality in the login request according to the output result, and if there is an abnormality, call the early warning strategy for processing;
模型优化模块705,用于定期收集异常修正反馈数据,并根据所述异常修正反馈数据优化异常登录检测模型。The model optimization module 705 is used to regularly collect abnormal correction feedback data and optimize the abnormal login detection model according to the abnormal correction feedback data.
在本实施例中,模型构建模块701包括:第一收集单元7011,用于收集历史账号的登录日志数据;预处理单元7012,用于对所述历史账号的登录日志数据进行预处理,生成登录日志数据集;训练单元7013,用于将所述登录日志数据集输入基于Isolation Forest模型进行训练,得到异常登录检测模型。In this embodiment, the model building module 701 includes: a first collection unit 7011, used to collect login log data of historical accounts; a preprocessing unit 7012, used to preprocess the login log data of the historical accounts to generate a login log data set; a training unit 7013, used to input the login log data set into the Isolation Forest model for training to obtain an abnormal login detection model.
在本实施例中,提取模块702包括:实时接收单元7021,用于使用Apache Spark结合Kafka建立流处理框架以实时接收登录请求数据;解析单元7022,用于对所述登录请求数据进行解析,并划分为多个字段;提取单元7023,用于从所述多个字段中提取登录请求数据的特征数据,所述特征数据包括时间特征、地理位置特征、设备信息特征和登录行为特征。In this embodiment, the extraction module 702 includes: a real-time receiving unit 7021, which is used to use Apache Spark in combination with Kafka to establish a stream processing framework to receive login request data in real time; a parsing unit 7022, which is used to parse the login request data and divide it into multiple fields; an extraction unit 7023, which is used to extract feature data of the login request data from the multiple fields, and the feature data includes time features, geographic location features, device information features and login behavior features.
在本实施例中,异常检测模块703包括:第一获取单元7031,用于获取每个所述特征数据对应的特征向量;异常检测单元7032,用于将每个所述特征数据对应的特征向量输入所述异常登录检测模型;第二获取单元7033,用于获取所述异常登录检测模型输出的与每个特征数据对应的特征向量的平均叶子节点高度。In this embodiment, the anomaly detection module 703 includes: a first acquisition unit 7031, used to obtain the feature vector corresponding to each of the feature data; an anomaly detection unit 7032, used to input the feature vector corresponding to each of the feature data into the abnormal login detection model; a second acquisition unit 7033, used to obtain the average leaf node height of the feature vector corresponding to each feature data output by the abnormal login detection model.
在本实施例中,预警模块704包括:比较单元7041,用于将每个所述特征数据对应的特征向量的平均叶子节点高度与预设异常阈值进行比较,若每个所述特征数据对应的特征向量的平均叶子节点高度大于预设异常阈值,则判定登录请求异常;划分单元7042,用于当判定为登录请求异常时,计算每个所述特征数据对应的特征向量的平均叶子节点高度与预设异常阈值的差值,并根据所述差值划分为不同的异常级别;预警单元,用于根据划分的异常级别调用对应级别的预警策略进行处理。In this embodiment, the early warning module 704 includes: a comparison unit 7041, which is used to compare the average leaf node height of the feature vector corresponding to each of the feature data with a preset abnormal threshold. If the average leaf node height of the feature vector corresponding to each of the feature data is greater than the preset abnormal threshold, the login request is determined to be abnormal; a division unit 7042, which is used to calculate the difference between the average leaf node height of the feature vector corresponding to each of the feature data and the preset abnormal threshold when the login request is determined to be abnormal, and divide it into different abnormal levels according to the difference; an early warning unit, which is used to call the early warning strategy of the corresponding level for processing according to the divided abnormal level.
在本实施例中,模型优化模块705包括:第一收集单元7051,用于定期收集异常修正反馈数据;分析单元7052,用于对所述异常修正反馈数据进行分析,得到分析结果,所述分析结果包括误报率和漏报率;优化单元7053,用于根据所述分析结果对所述异常登录检测模型的参数进行优化,使用经过优化的参数重新训练所述异常登录检测模型。In this embodiment, the model optimization module 705 includes: a first collection unit 7051, which is used to regularly collect abnormal correction feedback data; an analysis unit 7052, which is used to analyze the abnormal correction feedback data to obtain analysis results, and the analysis results include false alarm rate and missed alarm rate; an optimization unit 7053, which is used to optimize the parameters of the abnormal login detection model according to the analysis results, and retrain the abnormal login detection model using the optimized parameters.
本实施例中,采用实时接收数据和异常登录检测模型预测的方式,能够快速响应异常登录行为,及时发出预警;而且异常登录检测模型基于Isolation Forest算法来识别异常点,能够能有效处理大规模数据和提高检测准确度,此外,其通过收集异常修正反馈数据不断优化模型,使其适应新的登录行为模式,能够提高适应性和准确性。In this embodiment, the method of real-time data reception and abnormal login detection model prediction is adopted, which can quickly respond to abnormal login behavior and issue early warnings in time; and the abnormal login detection model is based on the Isolation Forest algorithm to identify abnormal points, which can effectively process large-scale data and improve detection accuracy. In addition, it continuously optimizes the model by collecting abnormal correction feedback data to adapt it to new login behavior patterns, which can improve adaptability and accuracy.
上面图7从模块化功能实体的角度对本发明实施例中的异常登录预警装置进行详细描述,下面从硬件处理的角度对本发明实施例中异常登录预警设备进行详细描述。FIG. 7 above describes in detail the abnormal login warning device in the embodiment of the present invention from the perspective of modular functional entities, and the following describes in detail the abnormal login warning device in the embodiment of the present invention from the perspective of hardware processing.
图8是本发明实施例提供的一种异常登录预警设备的结构示意图,该设备800可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(centralprocessing units,CPU)810(例如,一个或一个以上处理器)和存储器820,一个或一个以上存储应用程序833或数据832的存储介质830(例如一个或一个以上海量存储设备)。其中,存储器820和存储介质830可以是短暂存储或持久存储。存储在存储介质830的程序可以包括一个或一个以上模块(图未示),每个模块可以包括对设备800中的一系列指令操作。更进一步地,处理器810可以设置为与存储介质830通信,在设备800上执行存储介质中的一系列指令操作。8 is a schematic diagram of the structure of an abnormal login warning device provided by an embodiment of the present invention. The device 800 may have relatively large differences due to different configurations or performances, and may include one or more processors (central processing units, CPU) 810 (for example, one or more processors) and a memory 820, and one or more storage media 830 (for example, one or more mass storage devices) storing application programs 833 or data 832. Among them, the memory 820 and the storage medium 830 can be short-term storage or permanent storage. The program stored in the storage medium 830 may include one or more modules (not shown), and each module may include a series of instruction operations in the device 800. Furthermore, the processor 810 may be configured to communicate with the storage medium 830 and execute a series of instruction operations in the storage medium on the device 800.
设备800还可以包括一个或一个以上电源840,一个或一个以上有线或无线网络接口850,一个或一个以上输入输出接口860,和/或,一个或一个以上操作系统831,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。The device 800 may also include one or more power supplies 840, one or more wired or wireless network interfaces 850, one or more input and output interfaces 860, and/or one or more operating systems 831, such as Windows Serve, Mac OS X, Unix, Linux, FreeBSD, etc.
本发明实施例还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,该计算机可读存储介质也可以为易失性计算机可读存储介质,计算机可读存储介质中存储有指令,当指令在计算机上运行时,使得计算机执行异常登录预警方法的步骤。An embodiment of the present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. Instructions are stored in the computer-readable storage medium. When the instructions are executed on a computer, the computer executes the steps of the abnormal login warning method.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统或装置、单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device or unit described above can refer to the corresponding process in the aforementioned method embodiment and will not be repeated here.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including a number of instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk and other media that can store program codes.
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。As described above, the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit the same. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that the technical solutions described in the aforementioned embodiments may still be modified, or some of the technical features may be replaced by equivalents. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311832453.6A CN117972581A (en) | 2023-12-27 | 2023-12-27 | Abnormal login early warning method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311832453.6A CN117972581A (en) | 2023-12-27 | 2023-12-27 | Abnormal login early warning method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117972581A true CN117972581A (en) | 2024-05-03 |
Family
ID=90863930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311832453.6A Pending CN117972581A (en) | 2023-12-27 | 2023-12-27 | Abnormal login early warning method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117972581A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119579122A (en) * | 2025-02-08 | 2025-03-07 | 泉州行创网络科技有限公司 | An enterprise attendance management system based on big data analysis |
-
2023
- 2023-12-27 CN CN202311832453.6A patent/CN117972581A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119579122A (en) * | 2025-02-08 | 2025-03-07 | 泉州行创网络科技有限公司 | An enterprise attendance management system based on big data analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111177095B (en) | Log analysis method, device, computer equipment and storage medium | |
US11841786B2 (en) | Predictive anomaly detection framework | |
US11258814B2 (en) | Methods and systems for using embedding from Natural Language Processing (NLP) for enhanced network analytics | |
US11354583B2 (en) | Automatically generating rules for event detection systems | |
CN111597247A (en) | Data anomaly analysis method and device and storage medium | |
CN113918367B (en) | A large-scale system log anomaly detection method based on attention mechanism | |
CN112084055A (en) | Fault locating method, device, electronic device and storage medium for application system | |
US20210097433A1 (en) | Automated problem detection for machine learning models | |
US20230376372A1 (en) | Multi-modality root cause localization for cloud computing systems | |
CN111667141A (en) | Pending task case processing method, device, equipment and storage medium | |
CN113112038B (en) | Intelligent monitoring and diagnostic analysis system, device, electronic equipment and storage medium | |
CN119675988B (en) | Big data-driven network operation status monitoring and management method and system | |
CN111913824A (en) | Method for determining data link fault reason and related equipment | |
CN118484356A (en) | A server status monitoring method and system based on RPA | |
CN117972581A (en) | Abnormal login early warning method, device, equipment and storage medium | |
CN117112339A (en) | Abnormality detection method, abnormality detection device, electronic device, and computer program product | |
CN119988157B (en) | A data collection method and system based on big data of intelligent operation and maintenance platform | |
Shih et al. | Implementation and visualization of a netflow log data lake system for cyberattack detection using distributed deep learning. | |
CN107579944B (en) | Artificial intelligence and MapReduce-based security attack prediction method | |
CN118709184B (en) | Malicious code escape detection method and device | |
CN119621549A (en) | System abnormality positioning notification method, device, computer equipment, and storage medium | |
US12205039B1 (en) | Group masked autoencoder for anomaly detection | |
CN119938365A (en) | Log processing method, device and equipment | |
Li et al. | Event block identification and analysis for effective anomaly detection to build reliable HPC systems | |
CN117435441B (en) | Log data-based fault diagnosis method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination |