HK40013358A

HK40013358A - Continuous learning for intrusion detection

Info

Publication number: HK40013358A
Application number: HK62020002818.6A
Authority: HK
Inventors: P‧罗; R‧H‧布里格斯; N‧艾哈迈德
Original assignee: 微软技术许可有限责任公司
Priority date: 2017-01-30
Filing date: 2018-01-22
Publication date: 2020-08-07

Description

Continuous learning for intrusion detection

Background

Computer networks are subject to constant threats from malicious parties attempting to gain unauthorized access to systems hosted thereon. The strategy used by a malicious party to attack the network and the strategy used by a network administrator to defend against the attack are continuously developed according to each other; new vulnerabilities are added to the weapon base of the malicious party and invalid vulnerabilities are discarded. However, implementing countermeasures is typically passive, in that a network administrator must wait to identify the latest vulnerability before deploying the countermeasures and determining when to stop deploying the countermeasures when the corresponding vulnerability is no longer being used. Correctly identifying and blocking up-to-date vulnerabilities is often challenging for network administrators, especially when vulnerabilities have not been widely applied or attack a small number of services offered on the network.

Disclosure of Invention

This summary is provided to introduce a selection of design concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify all key or essential features of the claimed subject matter, nor is it intended as an aid in determining the scope of the claimed subject matter.

Systems, methods, and computer storage devices including instructions for providing continuous learning for intrusion detection are provided herein. A plurality of machine learning models are continuously retrained on network signals based on signals representing attacks and benign behaviors collected from machines and devices within the network. The rolling windows are used to collect signals so that the model uses the most recent data to identify attacks, and the model is continually upgraded and downgraded to protect the network because they can accurately detect the fluctuations of attacks in response to the components of the most recent data. Models deployed in an actively-generated network can provide their detection in near real-time to a security analyst who provides feedback on the accuracy of the model (e.g., missing intrusion/false positive, missing identified intrusion) to further optimize the manner in which the model is trained.

To improve the reliability of the training data set used to continuously retrain and optimize the detection model, and thus improve the model, the attack signals are balanced to account for their scarcity compared to benign signals and with respect to specific attack types. The benign signals are superimposed with various types of attack signals from other machines to provide a balanced training set for training and optimizing the model. Among the attack signals in the balanced training set, the signals of the various attack types are also balanced to ensure that the model is equally trained on all attack types. Features of the signal are dynamically extracted via a text-based configuration, thereby increasing the flexibility of the model to respond to different sets of features indicative of attacks on the network.

In various aspects, attacks are simulated by known internal attackers to improve the readiness of the network and generate additional attack signals. Similarly, historically important attack signals are used in some aspects, such that certain types of attack signals are presented to the model even if those signals have not been observed in the rolling window.

By providing a network with a continuously learning intrusion detection model, the functionality of devices and software in the network is improved. New forms of attacks are identified more quickly and reliably, solving the computer-centric problem of how to improve network security. In addition, computational resources are not wasted in attempting to detect forms of obsolete attacks, thereby reducing processing resources used to protect the network from malicious parties.

Examples are implemented as a computer process, a computing system, or as an article of manufacture (e.g., an apparatus, a computer program product, or a computer readable medium). According to one aspect, the computer program product is a computer storage medium readable by a computer system and encoding a computer program of instructions for executing a computer process.

The details of one or more aspects are set forth in the accompanying drawings and the description below. Other features and advantages will be apparent from a reading of the following detailed description and a review of the associated drawings. It is to be understood that the following detailed description is illustrative only and is not restrictive of the claims.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate various aspects. In the drawings:

FIG. 1A is an example security system that can be used to practice the present disclosure;

FIG. 1B is an example model training and selection system for use with the example security system of FIG. 1A that may be used to practice the present disclosure;

FIG. 2 is a flow diagram showing the general stages involved in an example method for developing a training data set by which to train a predictive model for protecting an online service;

FIG. 3 is a flow diagram showing the general stages involved in an example method for training and selecting predictive models for protecting online services; and

FIG. 4 is a block diagram illustrating example physical components of a computing device.

Detailed Description

The following detailed description refers to the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the following description refers to the same or similar elements. While examples may be described, modifications, adaptations, and other embodiments are possible. For example, substitutions, additions or modifications may be made to the elements illustrated in the drawings, and the methods described herein may be modified by substituting, reordering or adding stages to the disclosed methods. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims. Examples may take the form of a hardware embodiment, or an entirely software embodiment, or an embodiment combining software and hardware aspects. The following detailed description is, therefore, not to be taken in a limiting sense.

Systems, methods, and computer-readable storage devices comprising instructions for providing increased network security via continuous learning intrusion detection models are discussed herein. By providing a network with a continuously learning intrusion detection model, the functionality of devices and software in the network is improved. New forms of attacks are identified more quickly and reliably, solving the computer-centric problem of how to improve network security. In addition, computational resources are not wasted in attempting to detect forms of obsolete attacks, thereby reducing processing resources used to protect the network from malicious parties.

Fig. 1A is an example security system 100 that can be used to practice the present disclosure. As shown in fig. 1A, an online service 110 is connected by various users (which may be benign or malicious) and a security system 100. Online service 110 represents a collection of networked computing devices (e.g., a cloud data center) that provide "cloud" services to various users, including but not limited to: infrastructure as a service (IaaS), where users provide operating systems and software running on devices of the online service 110; platform as a service (PaaS), where users provide software and online services 110 provide operating systems and devices; or software as a service (SaaS), where the online service 110 provides both an operating system and software to run on the user's device. A user attempting to access the online service 110 may be a legitimate user or a malicious party that exploits a security breach into the online service 110 in order to run unauthorized processes and/or retrieve data from the online service 110 without legitimate authorization.

To determine whether a user is benign or malicious or whether a device is secure (does not send malicious signals) or corrupted (sends malicious signals), various security signals 115 from online service 110 are collected and fed into production model 120 to produce detection results 125 that indicate whether a given session is malicious or benign. The security signals 115 include event logs, network traces, system commands, etc., that are analyzed by the production model 120 to obtain characteristics and their eigenvalues that are indicative of malicious or benign behavior determined via training of the production model 120. For purposes of this disclosure, a particular security signal 115 is referred to as "malicious" or "benign" based on actions in the online service 110 associated with generating the particular security signal 115. Also, as used herein, the term "feature" is a digital attribute derived from one or more input signals related to an observed characteristic or behavior in a network that is accepted by a "model," which is an algorithm that accepts a set of features (also referred to as model features) defined by an analyst to convert the values of the features into prediction scores or the confidence that the features are indicative of malicious or benign activity.

The security signal 115 is provided to a generative model 120 in order to extract various features from the security signal 115 for which the generative model 120 has been trained to identify malicious activity on the online service 110. The security signal 115 is a collection of one or more related events that occur on devices within the online service 110, and may include several features (e.g., ports used, IP addresses connected, device identification codes/types from which signals are received, users, actions taken) from which a subset is extracted for examination by the production model 120 to determine whether the security signal 115 is benign or malicious. Features from one or more security signals 115 are combined into a feature vector for analysis, and in various aspects, the features may be scored to provide a numerical analysis of the features for input to the production model 120.

For example, a given IP (Internet protocol) address may be scored based on its frequency of use, wherein using the given IP address more frequently during the scrolling window 130 will change the value presented to the generative model 120 as compared to less frequently. Conversely, if a sensitive file is accessed, a prohibited action is taken, a blacklisted IP address is contacted, etc., a binary score indicating that a hazardous condition has occurred may be provided to the generation model 120 in the security signal 115. The production models 120 do not rely on white lists or black lists, and their training is related to features observed in the security signal 115 as discussed in more detail with respect to fig. 1B, 2, and 3, which can learn over time features indicative of intrusion into the online service 110 without the direction of a black or white list.

For a given security signal 115, the determination performed by the production model 120 specifies whether the security signal 115 under consideration is benign or malicious. These detection results 125 are associated with the security signal 115 in order to identify them as malicious or benign. In some aspects, at least some of these detection results 125 (e.g., malicious detection results 125) are provided to an analyst user who may operate on the detection results 125 to deploy countermeasures against the malicious user or attack, or determine that the detection results 125 warrant a different evaluation than that indicated by the production model 120. For example, when the production model 120 indicates a false positive, the analyst may evaluate that the signal is in fact malicious and indicates an action that should be taken. In another example, when a false positive is indicated for a benign signal, the analyst may evaluate that the signal is actually benign and indicate that no action should be taken. In yet another example, when the true positive of a malicious action is indicated, the analyst may indicate that no action is taken or that an action different from that recommended by the security system 100 is taken. Thus, corrections from the analysts are used for further training and improvement of the model.

In various aspects, the detection results 125 are also fed to a database storing a rolling window 130 of observed security signals 115 for the past d days (where d is configurable by an analyst user or another network administrator, e.g., for two days, ten days, fifteen days, etc.), and a database storing historical signals 135 of security signals 115 that should be used for training regardless of whether they were seen in the past d days. The historical signal 135 is supervised by an analyst user to include the security signal 115 associated with a known external attack. In a further aspect, an analyst user supervises the historical signals 135 to include benign signals that may appear suspect or otherwise return false positives of network intrusions to ensure that the predictive model is trained to correctly respond to signals that have historically proven difficult to correctly identify.

Automated attacker 140 uses known attack patterns and vulnerabilities to test the security of online service 110 and provides known results for use in conjunction with detection results 125 generated by production model 120. When detection result 125 of security signal 115, which is a result of an attack from automatic attacker 140, does not specify that the attack is malicious, security signal 115 will be considered malicious because automatic attacker 140 indicates that it is malicious. In various aspects, automatic attacker 140 is an optional component of security system 100 or online service 110.

The security signals 115 (including those in the history signals 135 and the rolling window 130, when available) are fed to the signal splitter 145 along with the detection results 125 from the production model 120 (and corrections from the analyst user) indicating whether the security signals 115 were determined to be benign or malicious. Similarly, in aspects in which an automated attacker 140 is deployed, the benign/malicious identity of the secure signal 115 generated from its actions on the online service 110 is provided to the signal splitter 145. The signal splitter 145 is configured to: the security signal 115 is divided into a benign signal provided to the benign signal balancer 150 and a malicious signal provided to the attack signal balancer 155.

The benign signal balancer 150 and the attack signal balancer 155 develop a set 115 of security signals that populate the data set used by the training data bootstrap 160 to provide balanced benign and malicious signals through which the model is trained to detect the latest vulnerabilities of the online service 110. Training data bootstrap program 160 removes benign signals received from the corrupted device in online service 110, leaving only malicious signals from the corrupted device. Benign signals from clean devices are cross-connected with malicious signals from corrupted devices, resulting in BxM attack examples, where B represents the number of benign examples and M represents the number of malicious examples. This operation produces an extended data set that superimposes attack examples onto benign examples as if the attack occurred on a clean device.

Cross-connecting the two data sets creates many scenarios with a large number of variants, since clean devices have different variants of benign signals, and corrupted devices have different variants of attack signals. However, if the scene is chosen randomly (e.g., by automatic attacker 140), there may be an unequal number of each attack type in the training set, which may skew the training of the model (resulting in some attacks being better predicted than others). Thus, the attack examples are balanced against the attack scenario to ensure that the number of attacks per example in the training set is substantially equal (e.g., ± 5%). In various aspects, underrepresented attack types (i.e., a number below a balanced number of attack types) replicate existing malicious signals to increase their relative number, and/or over represented attack types (i.e., a number above a balanced number of attack types) delete or replace/overwrite existing malicious signals with instances of underrepresented attack types to arrive at a balanced set of attack instances.

Similar to malicious signals, benign signals are balanced with respect to each other in terms of the type of device or role from which the signals are received, such that a given device type or role is not over-represented in the training data set (resulting in some attacks being better predicted on the given device type/role than other device types/roles). Thus, benign examples are balanced against available device types to ensure that the number of each device type providing benign examples is substantially equal (e.g., ± 5%). In various aspects, underrepresented device types (i.e., device types that are less than the balanced number) replicate existing benign signals to increase their relative number, and/or over represented device types (i.e., device types that are more than the balanced number) delete or replace/overlay existing benign signals with benign instances from the underrepresented device types to arrive at a balanced set of benign instances.

FIG. 1B is an example model training and selection system 105 for use with the example security system 100 of FIG. 1A that may be used to practice the present disclosure. The balanced data set of benign signals and malicious signals from the training data bootstrap program 160 is provided to the training/test splitter 165 to train and evaluate the various models through which the online service 110 is protected. The data set is divided into k subsets, where k-1 (e.g., two-thirds) of the available subsets are used to train the model, and one subset (e.g., one-third) of the data set is retained to evaluate the model. In various aspects, different scores are contemplated for splitting the data set into a training subset and an evaluation subset, which are provided to model trainer 170 and model evaluator 175, respectively.

The model trainer 170 is configured to train a plurality of development models 180 via a training subset of the balance data via one or more machine learning techniques. Machine learning techniques train the model to accurately predict the data fed into the model (e.g., whether the security signal 115 is benign or malicious; whether the noun is human, place, or thing; what the day will be). During the learning phase, models are developed for training data sets of known inputs (e.g., sample a, sample B, sample C) to optimize the models to correctly predict output for a given input. In general, the learning phase may be supervised, semi-supervised or unsupervised; indicating a reduced level that provides a "correct" output corresponding to the training input. In the supervised learning phase, all outputs are provided to the model, and the model is directed to develop general rules or algorithms that map inputs to outputs. In contrast, in the unsupervised learning phase, the inputs are not provided with the desired outputs, so that the model can develop its own rules to discover relationships within the training data set. In the semi-supervised learning phase, an incompletely labeled training set is provided, some of which outputs are known to the training data set and some of which outputs are unknown to the training data set.

The model may be run against a training data set for several periods, where the training data set is repeatedly fed into the model to optimize its results. For example, in a supervised learning phase, a model is developed to predict outputs for a given set of inputs and evaluated over several time periods to more reliably provide outputs designated as corresponding to the given inputs for a maximum number of inputs for a training data set. In another example, for an unsupervised learning phase, a model is developed to cluster datasets into n groups and evaluate over several time periods how consistently the model places a given input into a given group and how it reliably produces n desired clusters across each time period.

In various aspects, cross-validation is applied at the top of each training phase, where a portion of the training data set is used as the evaluation data set. For example, the training data set may be split into k segments, where (k-1) segments are used for the training epoch and the remaining segments are used to determine the performance of the trained model. In this way, each model is trained for each available combination of input parameters, such that each model is trained k times, and the best model parameters are selected based on their average performance across these epochs.

Once a period of time has been run, the models are evaluated and their variable values adjusted in an attempt to better optimize the models. In various aspects, the assessment is biased toward false-positives, or uniformly biased with respect to the overall accuracy of the model. The values may be adjusted in several ways depending on the machine learning technique used. For example, in a genetic or evolutionary algorithm, the value of the model that is predicted to be expected to output the most success is used to develop the value of the model for use during a subsequent time period, which may include random variants/variations to provide additional data points. One of ordinary skill in the art will be familiar with several other machine learning algorithms that may be applied with the present disclosure, including linear regression, random forests, decision tree learning, neural networks, and the like.

The model develops rules or algorithms over several periods by changing the values of one or more variables affecting the input to more closely map to the desired result, but since the training data set may vary, and is preferably very large, perfect accuracy and precision may not be achieved. Thus, the number of epochs that make up the learning phase can be set to a given number of trials or a fixed time/computational budget, or can be terminated before that number/budget is reached if the accuracy of a given model is high enough or low enough, or a platform of accuracy has been reached. For example, if the training phase is designed to run n epochs and produce models with at least 95% accuracy, and if such models are produced before the nth epoch, the learning phase may end early and use the produced models that meet the final target accuracy threshold. Similarly, if the accuracy of a given model is not sufficient to meet the random chance threshold (e.g., the model is only 55% accurate in determining true/false outputs for a given input), the learning phase for that model may terminate early, although other models may continue to be trained during the learning phase. Similarly, when a given model continues to provide similar accuracy or sways in its results across multiple epochs (a performance platform has been reached), the learning phase of the given model may terminate before the number of epochs/computational budget is reached.

Once the learning phase is complete, the model is finalized. The finally determined model is evaluated against test criteria. In a first example, a test data set comprising known outputs for its inputs is fed into a finally determined model to determine the accuracy of the model in processing data for which they have not been trained. In a second example, the false positive rate, the false negative rate may be used to evaluate the model after final determination. In a third example, the contours between clusters are used to select the model for which the cluster of data produces the sharpest boundary. In other examples, additional metrics of the model are evaluated, such as area under precision curve and area under recall curve (area under precision and recall curves).

The development model 180 (and thus the production model 120) is a predictive model that was initially developed by the model feature configurator 185 based on selections made by the administrative user. The administrative user selects one or more characteristics of the secure signals 115 to listen to on the devices of the online service 110 and how to analyze these characteristics to indicate whether a given secure signal 115 is malicious or benign. In various aspects, the features are provided in a structured text file (e.g., using extensible markup language (XML) or JavaScript object notation (JSON) tags) from which administrative users can select to define the feature set of the new development model 180. Based on the feature configuration, features are dynamically extracted from a given set of secure signals of the device as feature vectors. Different features may be extracted for different models based on their respective feature configurations. Thus, the structured text file allows administrative users to add or modify features to the model and how they are examined without adding or modifying code to the code base; the structured text file calls code segments from a code library that can be extended or modified by a developer to deliver new feature types for administrative users to select. For example, a managed user may select the following as a type of feature check to use with a given parameter or data field from the security signal 115: a Count of different values in the dataset (Count), a maximum value in the dataset (Max), a Count of the most frequently occurring values in the list (MaxCount), a maximum sum of the values in the list not exceeding the limit (MaxSum), etc. Examples of data fields/parameters to be observed in the security signal include, but are not limited to: signal type (e.g., data leak, login attempt, access request for a given file), port used, bytes used in a process/communication, bytes transferred to/from a given Internet Protocol (IP) address and port tuple, user identifier, whether a given IP address or action is on a blacklist or whitelist, etc.

Model evaluator 175 is configured to: the development models 180 are evaluated to determine which models will be used as production models 120 in the security system 100. In various aspects, the production models 120 are re-incorporated into the development models 180 for evaluation, or the accuracy thresholds of the production models 120 are used to determine whether to replace a given production model 120 with a development model 180. In other aspects, the development model 180 is compared to the production model 120 for other metrics (e.g., accuracy, area under precision curve, area under recall curve, etc.), with the best model selected as the upgrade model 190 to use as the production model 120. As model evaluator 175 determines the effectiveness of a model to correctly identify malicious signals as malicious and benign signals as benign, the model may be continuously promoted from development model 180 to production model 120 (and demoted from production model 120 to development model 180). In various aspects, the first n most accurate development models 180, or all development models 180 that exceed the accuracy threshold, are upgraded to the upgrade model 190 to the production model 120. In other aspects, the administrative user may manually upgrade development model 180 to production model 120, for example, when no other model is monitoring a given characteristic of safety signal 115.

The security system 100, model training and selection system 105, and their respective component elements are examples of a number of computing systems, including but not limited to: desktop computer systems, wired and wireless computing systems, mobile computing systems (e.g., mobile phones, netbooks, tablet or slate computers, notebook computers, and laptop computers), handheld devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, printers, and mainframe computers. The hardware of these computing systems is discussed in more detail with respect to FIG. 4.

Although the constituent elements of the security system 100 and the model training and selection system 105 are shown as being remote from one another for illustrative purposes, it should be noted that several configurations are possible in which one or more of these devices are hosted locally to another illustrated device, and each illustrated device may represent multiple instances of that device. Various servers and intermediaries familiar to those of ordinary skill in the art may be located between the constituent elements shown in fig. 1A and 1B in order to route communications between those systems, which are not shown so as not to distract from the novel aspects of the present disclosure.

FIG. 2 is a flow diagram illustrating the general stages involved in an example method 200 for developing a training data set by which to train a predictive model for protecting an online service 110. The method 200 begins with operation 210, where the security signal 115 is collected. In various aspects, the security signal 115 may be received in real-time (or near real-time, taking into account processing and transmission delays), or may be received and cached in a database for periodic review, e.g., reviewing security events every m minutes during a batch process. Secure signal 115 includes listening events and parameters for various actions occurring on machines in online service 110.

Listening events and parameters are used at operation 220 to identify whether a given security signal 115 corresponds to a malicious or benign action. In various aspects, the collected security signals 115 are fed to a predictive model (i.e., production model 120) designated for the real-time online service 110 to determine whether each security signal 115 is malicious or benign. Presenting these determinations to an analyst user who can not only take action to protect online service 110 from attack by malicious parties based on these determinations, but can also undo the determinations made by the predictive model; indicating whether the determination is a false positive or a false negative. Similarly, in aspects where automatic attacker 140 is used to simulate an attack on online service 110, automatic attacker 140 provides a notification that identifies security signals 115 produced in response to the attack as malicious, such that those security signals 115 are considered malicious regardless of the detection results from the predictive model.

At operation 230, the rolling window 130 is set to define a time frame from the current time in which the security signals 115 related to the latest vulnerabilities and attacks performed on the online service 110 are analyzed. The scrolling window 130 defines a set of security signals 115 that fall within a specified time period from the current time; the security signal 115 was collected over the last d days. A multi-day window is used to train and predict slow attacks that are performed over multiple days to avoid detection by traditional security systems. When the security signals 115 are collected, the most recent security signal is added to the set of security signals 115 of the rolling window 130, and the security signals 115 collected prior to the specified time period of the rolling window 130 are continuously removed from the set of security signals 115.

In some aspects, the historical signal 135 is optionally received at operation 240. The historical signals 135 from previously observed security signals 115 are supervised by the analyst user to include historically significant security signals 115 that represent certain attack types or benign use cases designated for training purposes, regardless of whether similar attacks or use cases have been seen within the time period of the rolling window 130. In one example, a historically dangerous vulnerability may have security signals 115 related to its detection added to the historical signals 135 to remain protected from the vulnerability at all times. In another example, a developer may discover a zero-day vulnerability and not know whether a malicious party is using it, and provide an example security signal 115 that mimics the action of the zero-day vulnerability to serve as a history signal 135 to proactively prevent the vulnerability even if the vulnerability has never been seen. In yet another example, a safety signal 115 that often results in false positives may be added to the historical signal 135 to ensure that the predictive model is trained for that particular safety signal 115. If available, the history signal 135 is added to the collection of security signals 115 collected within the rolling window 130.

Proceeding to operation 250, the method 200 balances the collected malicious and benign signals that fall within the rolling window 130 and any historical signals 135 added to the set at optional operation 240. In balancing the malicious signals, the attack type of each signal is determined such that the relative number of signals representing each attack type is equalized (i.e., balanced) such that no given attack type is over-represented or under-represented in the malicious signal population. When balancing for benign signals, benign signals received from devices that have generated malicious signals within the rolling window 130 are discarded and the relative amount of benign signals received from each type of device in the online service 110 is equalized, such that no given device type is over-represented or under-represented in the benign signal population.

Additionally, since the number of sets of expected malicious signals is less than the number of sets of benign signals, a portion of the set of benign signals may be selected to be cross-connected with the malicious signals to generate a new larger set of malicious signals at operation 260 such that the two sets will contain the desired ratio of malignant signals to benign signals. In various aspects, once the set of malicious signals and the set of benign signals reach a desired ratio (e.g., equal), the two sets are used together as a training set.

At operation 270, a training set of various attack scenarios consisting of balanced malicious and benign signals (as well as any historical signals 135) occurring in the rolling window 130 may be used to train the predictive model. For example, the production model 120 used to analyze the security signal 115 is continuously retrained and/or replaced with a different predictive model as the content of the rolling window 130 is updated over time to better evaluate attacks and vulnerabilities actively used against the online service 110. Accordingly, the method 200 may end after operation 270 or return to operation 210 to continue collecting the security signals 115 to provide the training data set periodically or continuously based on the rolling window 130.

FIG. 3 is a flow diagram illustrating the general stages involved in an example method 300 for training and selecting predictive models for protecting online services 110. The method 300 begins with operation 310, where a training data set (e.g., developed in accordance with the method 200) of balanced malicious and benign signals is received. In various aspects, the method 300 is invoked periodically (e.g., every h-hour) in response to an update or user command to the scrolling window 130 (and thus the training data set).

Proceeding to operation 320, the training data set is split into an evaluation subset and a learning subset. In various aspects, the size of the evaluation subset may vary with respect to the training data set, but is typically smaller in size than the learning subset. For example, the evaluation subset may be one third of the initial training set, so the learning subset will be the remaining two thirds of the initial training set. One of ordinary skill in the art will appreciate that other portions of the training data set may be split for use as the evaluation subset.

At operation 330, configuration features are received to generate the development model 180 as a potential predictive model for generation (i.e., as the generation model 120) to protect the online service 110. An administrative user (e.g., a security analyst) picks one or more parameters listed in online service 110 via security signal 115, as well as a feature type for examining those parameters. The security signal 115 includes, but is not limited to: event logs, network traces, error reports, special event listener reports, atomic detections, and combinations thereof, and the parameters of the selected features may include any element included in the secure signal 115.

For example, when the security signal 115 comprises a network trace, parameters of a sender/receiver address pair may be selected and evaluated according to a "count" feature type, such that the number of times the pair is seen within the training set increments the score/value for evaluating the feature. In another example, when the security signal 115 comprises a network trace, a parameter of the number of bytes sent between a sender/receiver pair is provided as a value/score for evaluating the characteristic. In yet another example, a parameter between the sender/receiver pair indicating a transmission balance of relative upload/download ratios is provided as a value/score for evaluating the characteristic. Those of ordinary skill in the art will recognize that the above are non-limiting examples; other parameters and other types of features through which those parameters may be evaluated by the predictive model are contemplated for use with the present application.

Proceeding to operation 340, the development model 180 is created based on the received feature configurations and the development model 180 is optimized from the learning subsets using one or more machine learning algorithms. Each predictive model is created to accept a particular feature vector (specifying features selected by the administrative user), where each feature that makes up the feature vector is associated with a coefficient. Each feature vector is dynamically extracted from the secure signal 115 based on the feature configuration. The values of the coefficients are adjusted over several periods of the machine learning algorithm so that when a given development model 180 receives input of feature vectors, the interaction between the various feature values can be adjusted to reliably produce either malicious or benign outputs to match the outputs specified in the learning subset.

Proceeding to operation 350, the predictive model optimized for the training data set in operation 340 is evaluated against the evaluation subset split from the training data set in operation 320. The evaluation subset includes inputs (security signals 115 collected from the online service 110) with known outputs of whether the signals are malicious or benign. In addition, evaluating the input/output pairs of the subset has not been used to directly train the development model 180, thus providing a test as to whether the development model 180 provides general functional rules for determining whether an unknown signal is malicious or benign.

An upgrade threshold is applied to development models 180 to determine whether to upgrade a given development model 180 to a production model 120. The upgrade threshold specifies how accurately the development model 180 needs to predict whether a signal is malicious or benign based on the feature vectors extracted from the secure signals 115. In some aspects, the escalation threshold is set to a constant, e.g., an accuracy of at least n%, an area under a given precision and recall curve with respect to the test data, and so forth. In other aspects, the upgrade threshold is set by the accuracy of the current generative model 120 for a given feature vector or attack type, such that for a development model 180 to replace the generative model 120 in the security system 100, the development model 180 must be more accurate than the current generative model 120.

At operation 360, the development model 180 and the re-evaluated production model 120 that perform best according to the evaluation subset and the upgrade threshold are upgraded for use by the security system 100 to protect the online service 110. Production models 120 that no longer meet the upgrade threshold or have been replaced by development models 180 may be deleted or downgraded to development models 180 for further training and correction, and for later re-evaluation. The method 300 may then end.

While some embodiments have been described in the general context of program modules that execute in conjunction with an application program that runs on an operating system on a computer, those skilled in the art will recognize that aspects may also be implemented in combination with other program modules. Generally, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types.

The aspects and functionality described herein may be implemented via a number of computing systems, including but not limited to: desktop computer systems, wired and wireless computing systems, mobile computing systems (e.g., mobile phones, netbooks, tablet or slate computers, notebook computers, and laptop computers), handheld devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, and mainframe computers.

Further, according to an aspect, the aspects and functions described herein operate on a distributed system (e.g., a cloud-based computing system) in which application functions, memory, data storage and retrieval, and various processing functions are operated remotely from one another over a distributed computing network (e.g., the internet or an intranet). According to an aspect, various types of user interfaces and information are displayed via an onboard computing device display, or via a remote display unit associated with one or more computing devices. For example, various types of user interfaces and information are displayed on or interacted with a wall surface onto which the various types of user interfaces and information are projected. Interactions with the numerous computing systems with which embodiments are practiced include: keystroke inputs, touch screen inputs, voice or other audio inputs, gesture inputs where the associated computing device is equipped with a detection (e.g., camera) function for capturing and interpreting user gestures for controlling functions of the computing device, and so forth.

FIG. 4 and the associated description provide a discussion of various operating environments in which examples are practiced. However, the devices and systems illustrated and discussed with respect to fig. 4 are for purposes of example and illustration, and are not limiting of the vast computing device configurations for practicing various aspects described herein.

Fig. 4 is a block diagram illustrating physical components (i.e., hardware) of a computing device 400 that may be used to practice examples of the present disclosure. In a basic configuration, computing device 400 includes at least one processing unit 402 and system memory 404. According to an aspect, depending on the configuration and type of computing device 400, system memory 404 is a memory storage device including, but not limited to: volatile storage (e.g., random access memory), non-volatile storage (e.g., read only memory), flash memory, or any combination of such memories. According to one aspect, the system memory 404 includes an operating system 405 and one or more program modules 406 adapted to run software applications 450. According to one aspect, the system memory 404 includes the security system 100, the model training and selection system 105, and any models used or generated thereby. For example, operating system 405 is suitable for controlling the operation of computing device 400. Further, the various aspects are practiced in conjunction with a graphics library, other operating systems, or any other application program, and are not limited to any particular application or system. This basic configuration is illustrated in fig. 4 by those components within dashed line 408. According to an aspect, the computing device 400 has additional features or functionality. For example, according to an aspect, computing device 400 includes additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in FIG. 4 by removable storage 409 and non-removable storage 410.

As mentioned above, according to an aspect, a number of program modules and data files are stored in system memory 404. When executed on processing unit 402, program modules 406 (e.g., security system 100, model training and selection system 105) perform processes including, but not limited to, one or more stages of methods 200 and 300 shown in fig. 2 and 3, respectively. According to an aspect, other program modules are used according to examples and include applications such as: e-mail and contacts applications, word processing applications, spreadsheet applications, database applications, slide presentation applications, drawing or computer-aided application programs, and the like.

According to an aspect, computing device 400 has one or more input devices 412, such as a keyboard, mouse, pen, voice input device, touch input device, etc. According to an aspect, output device(s) 414 such as a display, speakers, printer, etc. are also included. The foregoing devices are examples, and other devices may be used. According to an aspect, computing device 400 includes one or more communication connections 416 that allow communication with other computing devices 418. Examples of suitable communication connections 416 include, but are not limited to: radio Frequency (RF) transmitter, receiver, and/or transceiver circuitry; universal Serial Bus (USB), parallel, and/or serial ports.

The term "computer readable media" as used herein includes computer storage media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures or program modules. The system memory 404, the removable storage devices 409 and the non-removable storage devices 410 are all computer storage media examples (i.e., memory storage devices). According to one aspect, a computer storage medium comprises: computer storage media includes, but is not limited to, RAM, ROM, electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other article of manufacture that can be used to store the desired information and that can be accessed by computing device 400. In accordance with an aspect, any such computer storage media is part of computing device 400. Computer storage media does not include carrier waveforms or other propagated data signals.

In one aspect, communication media is embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. According to an aspect, the term "modulated data signal" describes a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, Radio Frequency (RF), infrared and other wireless media.

For example, according to various aspects, embodiments are described above with reference to block diagrams and/or operational illustrations of methods, systems, and computer program products. The functions/acts noted in the blocks may occur out of the order noted in any flow diagrams. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

The description and illustration of one or more examples provided in this application is not intended to limit or restrict the scope of the claims in any way. The aspects, examples, and details provided in this application are deemed sufficient to convey possession and enable others to make and use the best mode. The embodiments should not be construed as limited to any aspect, example, or detail provided in this application. Whether shown and described in combination or separately, various features (both structural and methodological) are intended to be selectively included or omitted to produce an example with a particular set of features. Having provided a description and illustration of the present application, those skilled in the art will envision variations, modifications, and alternative examples falling within the spirit of the broader aspects of the general inventive concepts embodied in this application without departing from the broader scope.

Claims

1. A method for securing online services via a continuous learning model, comprising:

collecting a set of security signals from the online service, wherein the set of security signals are collected in a rolling time window;

identifying whether each security signal in the set of security signals is malicious or benign;

balancing malicious signals in the set of secure signals with benign signals in the set of secure signals to produce a balanced training data set; and

generating a predictive model based on the balanced training dataset, wherein the predictive model is configured to identify whether a security signal is malicious or benign.

2. The method of claim 1, wherein identifying whether each security signal in the set of security signals is malicious or benign further comprises:

examining each security signal at a production model, wherein the production model is produced by a model trainer from the balanced training data set and is configured to produce a detection result of whether a given security signal is malicious or benign;

sending the detection result to an analyst user; and

in response to receiving an action from the analyst user for the detection result, updating the detection result to indicate whether the given security signal is malicious or benign.

3. The method of claim 2, wherein an automated attacker models an attack on the online service, and wherein identifying whether each security signal in the set of security signals is malicious or benign further comprises:

receiving a notification from the automated attacker identifying the security signal generated in response to the attack; and

the security signal generated in response to the attack is considered malicious regardless of the detection result.

4. The method of claim 2, wherein identifying whether each security signal in the set of security signals is malicious or benign further comprises:

extracting features from the given security signal;

determining whether the features extracted from the given security signal satisfy a set of features specified by a managed user as a defined attack type;

designating the given security signal as malicious in response to determining that the extracted features satisfy the feature set; and

designating the given security signal as benign in response to determining that the extracted features do not satisfy the feature set.

5. The method of claim 4, wherein balancing the malicious signals with the benign signals further comprises:

identifying the attack type of each of the malicious signals;

balancing the relative number of attack types of the set of attack types observed for the malicious signal by at least one of:

increasing a relative number of underrepresented attack types in the set of attack types; and

reducing a relative number of over-represented attack types in the set of attack types.

6. The method of claim 4, wherein the feature set is identified in a structured document submitted by the administrative user, the structured document specifying types of features and data fields to be observed in the secure signal, and the features in the feature set are dynamically extracted from the secure signal based on the structured document without having to modify code.

7. The method of claim 1, wherein a historical data signal is included in the set of security signals.

8. The method of claim 1, wherein balancing the malicious signals with the benign signals further comprises:

identifying a device in the online service from which at least one malicious signal was collected within the rolling window; and

removing a benign signal associated with the device from the set of secure signals.

9. A system for securing online services via a continuous learning model, comprising:

a processor; and

a memory storage device comprising instructions operable when executed by the processor to:

receiving a security signal from a device within the online service;

extracting a feature vector from each of the security signals, wherein a given feature vector provides a numerical value indicative of a state of a given device from which a given security signal was received;

generating, via an associated predictive model, a detection result for each of the feature vectors, wherein a given detection result identifies whether the given security signal indicates malicious or benign activity on the given device;

defining a rolling window, wherein the rolling window comprises a plurality of security signals and associated detection results that have been received within a time frame starting from a current time;

generating a balance training data set for the rolling window, wherein to generate the balance training data set, the system is further configured to:

identifying a type of attack of each of the security signals in the rolling window that is identified as indicative of malicious activity;

increasing the number of security signals in the rolling window identified as having an underrepresented attack type relative to security signals identified as having an over represented attack type; and

cross-connecting the security signals identified as indicative of malicious activity with security signals identified as indicative of benign activity to generate an attack scenario for the rolling window; and updating the associated predictive model based on the balance training data set according to a machine learning algorithm.

10. The system of claim 9, wherein updating the associated predictive model comprises: in response to the machine learning algorithm instructing a development model to more accurately identify whether the security signal is indicative of malicious or benign activity on the device according to the balanced training dataset, replacing a generation model used to generate the detection results with the development model, the development model developed from the balanced training dataset according to the machine learning algorithm.

11. The system of claim 9, wherein a historical signal is included in the rolling window, wherein the historical signal includes safety signals collected outside of the time frame.

12. The system of claim 9, wherein the security signals received from the devices within the online service comprise security signals generated in response to an automated attacker performing known malicious activity for the online service, and wherein the detection results generated for the security signals generated in response to the automated attacker performing the known malicious activity are set to: indicating that the given security signal indicates malicious activity based on a notification from the automatic attacker.

13. The system of claim 9, wherein to generate the balance training data set, the system is further configured to:

in response to identifying that a particular device from which a security signal identified as indicative of benign activity is received is associated with one or more security signals in the rolling window identified as indicative of malicious activity, removing the security signal identified as indicative of benign activity from the rolling window.

14. The system of claim 9, wherein to generate the balance training data set, the system is further configured to:

identifying a device type from which each of the security signals in the rolling window is received that is identified as indicative of benign activity; and

increasing the number of security signals in the rolling window identified as having a device type that is not adequately represented relative to security signals identified as having a device type that is over represented.

15. A computer-readable storage device comprising processor-executable instructions for protecting online services via a continuous learning model, comprising:

examining each security signal of the set of security signals via a predictive model to identify whether each security signal is malicious or benign, wherein the predictive model is configured to: generating a detection result of whether a given security signal is malicious or benign based on a feature vector defined by the administrative user;

correlating the security signal with the detection result to identify the security signal as a malicious signal or a benign signal;

balancing the malicious signals with the benign signals to produce a balanced training data set, comprising:

identifying an attack type of each of the malicious signals;

identifying a device type from which each of the benign signals was collected;

equalizing a relative number of malicious signals in the rolling window based on the identified attack type to produce a set of attack examples;

equalizing the relative number of benign signals in the rolling window based on the identified device type to produce a set of identified benign examples; and

cross-connecting the set of attack examples with at least a portion of the set of benign examples to balance a number of attack examples in the set of attack examples relative to a number of benign examples in the set of benign examples; and

optimizing the predictive model based on the balanced training dataset and a machine learning algorithm.