RU2790329C1

RU2790329C1 - Method for detecting an anomaly in the behavior of a trusted process and a system for its implementation

Info

Publication number: RU2790329C1
Application number: RU2022112463A
Authority: RU
Inventors: Андрей Александрович Иванов
Original assignee: Акционерное общество "Лаборатория Касперского"
Filing date: 2022-05-06
Publication date: 2023-02-16

Abstract

FIELD: malicious behavior cases detection.

SUBSTANCE: invention relates to solutions for detecting cases of malicious behavior based on the exploitation of vulnerabilities in trusted processes. The specified effect is achieved by joint application of a module that includes a machine learning algorithm and a stochastic modeling tool, namely the Markov chain. On the basis of Markov chains, basic models of the behavior of each trusted process are formed, while each event that occurs is determined by a weighting factor. The event weight indicates the probability that the event will occur during the execution of the trusted process.

EFFECT: improving the efficiency of detecting anomalies in the behavior of trusted processes.

10 cl, 5 dwg

Description

Область техникиTechnical field

Настоящее изобретение относится к решениям для выявления случаев вредоносного поведения, основанных на эксплуатации уязвимостей в доверенных процессах, а именно к способам и системам обнаружения аномалий в поведении доверенных процессов.The present invention relates to solutions for detecting cases of malicious behavior based on the exploitation of vulnerabilities in trusted processes, and in particular to methods and systems for detecting anomalies in the behavior of trusted processes.

Уровень техникиState of the art

Борьба с вредоносными программами – одна из самых важных задач современной компьютерной безопасности. В области технических средств противодействия вредоносным программам за последний десяток лет стал наблюдаться кризис. Соперничество между злоумышленниками, использующими вредоносные программы и уязвимости в программном обеспечении, и разработчиками антивирусного программного обеспечения привело к тому, что последние играют роль «догоняющих». Злоумышленники каждый раз изощряются при создании вредоносной программы или использовании уязвимости доверенных программ для получения выгоды.The fight against malware is one of the most important tasks of modern computer security. In the field of technical means of counteracting malicious programs, a crisis has begun to be observed over the past ten years. The rivalry between attackers who use malware and software vulnerabilities and antivirus software developers has led to the latter playing the role of "catch-up". Attackers are getting smarter every time they create malware or exploit vulnerabilities in trusted programs to their advantage.

В настоящее время злоумышленники стали эффективно скрывать свою деятельность. Техники маскировки сделали традиционные сигнатурные и пороговые (англ. threshold-based) детективные меры практически бесполезными, где детективные меры – меры, помогающие выявлять связанную с инцидентом деятельность. В тоже время индустрия информационной безопасности стала развивать использование поведенческого анализа, который связан с поиском шаблонов активности и математически значимых отклонений в пользовательском поведении (использование подозрительных приложений, операции поиска файлов) или программном (процессорном) поведении, основываясь на исторических исходных данных.Currently, attackers have begun to effectively hide their activities. Masking techniques have rendered traditional signature and threshold-based detective measures almost useless, where detective measures are measures that help detect incident-related activity. At the same time, the information security industry began to develop the use of behavioral analysis, which is associated with the search for activity patterns and mathematically significant deviations in user behavior (use of suspicious applications, file search operations) or program (processor) behavior, based on historical initial data.

Одними из перспективных направлений являются технические решения, направленные на обнаружение вредоносных программ или отклонений в поведении доверенных программ на основании поиска аномалий, так как:One of the promising areas is technical solutions aimed at detecting malicious programs or deviations in the behavior of trusted programs based on the search for anomalies, since:

• позволяют выстроить индивидуальную защиту устройства пользователя;• allow building individual protection of the user's device;

• не требуют частого обновления баз;• do not require frequent database updates;

• способны обнаруживать как известные, так и неизвестные ранее угрозы.• Capable of detecting both known and previously unknown threats.

Злоумышленники в результате эксплуатации уязвимостей в программе могут получить возможность выполнения вредоносных действий от лица доверенного процесса. Существующие методы обнаружения эксплуатации уязвимостей и следующих за ней деструктивных действий строятся на основе классических методов, таких как сигнатурный и эвристический анализ, и не всегда способны обнаруживать эксплуатацию ранее неизвестной уязвимости.As a result of exploiting vulnerabilities in the application, attackers may be able to perform malicious actions on behalf of a trusted process. Existing methods for detecting exploitation of vulnerabilities and subsequent destructive actions are based on classical methods, such as signature and heuristic analysis, and are not always able to detect the exploitation of a previously unknown vulnerability.

Для устранения ограничений существующих решений по поиску аномалий в поведении доверенных процессов требуются новые решения, позволяющие решить задачу поиска аномалии и уязвимости в программах. Так, предлагаются решения, основанные на применении алгоритмов машинного обучения совместно с цепями Маркова для обнаружения эксплуатации уязвимостей в доверенных приложениях, а именно обнаружения аномалий в поведении доверенных процессов.To eliminate the limitations of existing solutions for finding anomalies in the behavior of trusted processes, new solutions are required to solve the problem of finding anomalies and vulnerabilities in programs. Thus, solutions are proposed based on the use of machine learning algorithms in conjunction with Markov chains to detect the exploitation of vulnerabilities in trusted applications, namely, the detection of anomalies in the behavior of trusted processes.

Раскрытие изобретенияDisclosure of invention

Настоящее изобретение относится к решениям для выявления случаев вредоносного поведения, основанных на эксплуатации уязвимостей в доверенных процессах, а именно к способам и системам обнаружения аномалий в поведении доверенных процессов. Настоящее изобретение заключается в совместном применении модуля, реализованного на основании по крайней мере одного метода машинного обучения, и модуля, используемого инструмент стохастического моделирования, а именно цепи Маркова.The present invention relates to solutions for detecting cases of malicious behavior based on the exploitation of vulnerabilities in trusted processes, and in particular to methods and systems for detecting anomalies in the behavior of trusted processes. The present invention consists in the joint application of a module implemented on the basis of at least one machine learning method and a module used by a stochastic modeling tool, namely a Markov chain.

Технический результат настоящего изобретения заключается в повышении эффективности обнаружения аномалий в поведении доверенных процессов путем применения базовых моделей поведения, основанных на принципе цепи Маркова, совместно с модулем машинного обучения, основанном по крайней мере на одном принципе машинного обучения.The technical result of the present invention is to increase the efficiency of anomaly detection in the behavior of trusted processes by applying basic behavior models based on the Markov chain principle, together with a machine learning module based on at least one machine learning principle.

В качестве одного варианта исполнения настоящего изобретения предлагается способ обнаружения аномалии в поведении доверенного процесса, в соответствии с которым обнаруживают запуск доверенного процесса в компьютерной системе, запрашивают в хранилище данных базовую модель поведения, выраженную в виде цепи Маркова, и модель машинного обучения (Модель ML) для файла, из которого запущен доверенный процесс, производят сбор данных о доверенном процессе при помощи базовой модели поведения, во время которого отслеживают происходящие события согласно цепи Маркова из базовой модели поведения, определяют вероятность появления каждого события в доверенном процессе с помощью применения цепи Маркова, и рассчитывают общую вероятность появления произошедших событий; сравнивают общую вероятность появления произошедших событий с заданным порогом, при превышении заданного порога выделяют события, которые произошли в процессе следования по цепи Маркова, и передают данные о событиях на вход Модели ML, при этом данные включают цепочку произошедших событий; анализируют данные событий при помощи Модели ML, в результате анализа выносят решение об аномальном поведении.As one embodiment of the present invention, a method for detecting an anomaly in the behavior of a trusted process is provided, according to which the launch of a trusted process in a computer system is detected, a basic behavior model expressed in the form of a Markov chain is requested from the data warehouse, and a machine learning model (ML Model) for the file from which the trusted process is launched, data is collected about the trusted process using the base behavior model, during which events are tracked according to the Markov chain from the base behavior model, the probability of occurrence of each event in the trusted process is determined using the Markov chain, and calculate the overall probability of occurrence of the events that have occurred; comparing the total probability of occurring events with a predetermined threshold, when the predetermined threshold is exceeded, highlighting events that occurred while following the Markov chain, and transmitting event data to the input of the ML Model, while the data includes a chain of events that occurred; analyze the event data using the ML Model, as a result of the analysis, make a decision about anomalous behavior.

В другом варианте исполнения способа базовая модель поведения, выраженная в виде цепи Маркова, содержит цепочки событий, происходящих при выполнении доверенного процесса, при этом каждое событие имеет свою вероятность появленияIn another embodiment of the method, the basic behavior model, expressed as a Markov chain, contains chains of events that occur during the execution of a trusted process, with each event having its own probability of occurrence

В еще одном варианте исполнения способа события являются доверенными, при этом доверенные события имеют высокую вероятность появления в цепочки, где вероятность рассчитывается как отношение количества появлений конкретного события для текущего процесса к общему количеству появлений подобных событий для текущего процесса.In yet another embodiment of the method, events are trusted, wherein trusted events have a high probability of occurring in a chain, where the probability is calculated as the ratio of the number of occurrences of a particular event for the current process to the total number of occurrences of similar events for the current process.

В другом варианте исполнения способа Модель ML заранее обучена на вхождении текстовых слов, которые встречались в атрибутах событий в цепочке.In another embodiment of the method, the ML Model is pre-trained on the occurrence of text words that occurred in the attributes of events in the chain.

В еще одном варианте исполнения способа указанная модель ML заранее обучалась как на логах без аномалий, так и с ними, где в результате обучения аномальные события определяются низкими коэффициентами.In another embodiment of the method, the specified ML model was trained in advance both on logs without anomalies and with them, where, as a result of training, anomalous events are determined by low coefficients.

В другом варианте исполнения способа при выявлении аномального поведения производят блокирование доверенного процесса и уведомление пользователя.In another embodiment of the method, when anomalous behavior is detected, the trusted process is blocked and the user is notified.

В еще одном варианте исполнения способа при вынесении решения об аномальном поведении проводят проверку на ложное срабатывание путем передачи данных компоненту, отвечающему за указанную проверку, о вынесенном решении и получении ответа с результатами проверки.In another embodiment of the method, when making a decision about anomalous behavior, a false positive is checked by transmitting data to the component responsible for the specified check about the decision made and receiving a response with the results of the check.

В другом варианте исполнения способа получают результат проверки ложного срабатывания, при этом если обнаружена ошибка в решении о выявленной аномалии, то проводят переобучение Модели ML путем переобучения совместно с данными, при анализе которых возникла ошибка.In another version of the method, the result of checking a false positive is obtained, and if an error is found in the decision about the detected anomaly, then the ML Model is retrained by retraining together with the data, during the analysis of which the error occurred.

В еще одном варианте исполнения способа цепочка произошедших событий включает события от начального события до события, после которого был достигнут заданный порог.In yet another embodiment of the method, the chain of events that have occurred includes events from an initial event to an event after which a predetermined threshold was reached.

В качестве другого варианта исполнения настоящего изобретения предлагается система, содержащая по меньшей мере один компьютер, включающий взаимодействующие между собой средства: модуль обучения, средство контроля, средство анализа, хранилище данных, содержащее статистические данные о работе процессов, базовые модели поведения и Модели ML, и хранящиеся машиночитаемые инструкции, при выполнении которых система выполняет обнаружение аномалии в поведении доверенного процесса согласно любому из указанных выше вариантах исполнения способа.As another embodiment of the present invention, a system is proposed that contains at least one computer, including interacting tools: a learning module, a monitoring tool, an analysis tool, a data store containing statistical data on the operation of processes, basic behavior models and ML models, and stored machine-readable instructions, upon execution of which the system performs detection of an anomaly in the behavior of the trusted process according to any of the above embodiments of the method.

Краткое описание чертежейBrief description of the drawings

Дополнительные цели, признаки и преимущества настоящего изобретения будут очевидными из прочтения последующего описания осуществления изобретения со ссылкой на прилагаемые чертежи, на которых:Additional objects, features and advantages of the present invention will become apparent from reading the following description of an embodiment of the invention with reference to the accompanying drawings, in which:

На Фиг. 1 представлен схематически пример реализации системы для осуществления способа обнаружения аномалий в поведении доверенных процессов на основании цепей Маркова и алгоритма машинного обучения.On FIG. 1 shows a schematic example of a system implementation for implementing a method for detecting anomalies in the behavior of trusted processes based on Markov chains and a machine learning algorithm.

На Фиг. 2 представлен пример базовой модели поведения доверенного процесса в виде ориентированного графа.On FIG. Figure 2 shows an example of a basic model of trusted process behavior in the form of a directed graph.

На Фиг. 3 представлена блок-схема, иллюстрирующая способ обнаружения аномалий в поведении доверенных процессов на основании цепей Маркова и алгоритма машинного обучения.OnFig. 3presented a flowchart illustrating a method for detecting anomalies in the behavior of trusted processes based on Markov chains and a machine learning algorithm.

На Фиг. 4 представлена блок-схема, иллюстрирующая пример формирования базовой модели поведения и модели машинного обучения (Модель ML) для доверенного процесса.OnFig. 4 presented a flowchart illustrating an example of generating a basic behavior model and a machine learning model (ML Model) for a trusted process.

Фиг. 5 иллюстрирует пример компьютерной системы общего назначения, с помощью которой может быть реализовано заявленное изобретение. Fig. 5 illustrates an example of a general purpose computer system with which the claimed invention may be implemented.

Хотя изобретение может иметь различные модификации и альтернативные формы, характерные признаки, показанные в качестве примера на чертежах, будут описаны подробно. Следует понимать, что цель описания заключается не в ограничении изобретения конкретным его воплощением. Наоборот, целью описания является охват всех изменений, модификаций, входящих в рамки данного изобретения, как это определено приложенной формуле.Although the invention may have various modifications and alternative forms, the characteristic features shown by way of example in the drawings will be described in detail. It should be understood that the purpose of the description is not to limit the invention to a particular embodiment. On the contrary, the purpose of the description is to cover all changes, modifications, included in the scope of this invention, as defined by the attached claims.

Описание вариантов осуществления изобретенияDescription of embodiments of the invention

Объекты и признаки настоящего изобретения, способы для достижения этих объектов и признаков станут очевидными посредством отсылки к примерным вариантам осуществления. Однако настоящее изобретение не ограничивается примерными вариантами осуществления, раскрытыми ниже, оно может воплощаться в различных видах. Приведенное описание предназначено для помощи специалисту в области техники для исчерпывающего понимания изобретения, которое определяется только в объеме приложенной формулы.The objects and features of the present invention, methods for achieving these objects and features will become apparent by reference to exemplary embodiments. However, the present invention is not limited to the exemplary embodiments disclosed below, but may be embodied in various forms. The foregoing description is intended to assist a person skilled in the art in a thorough understanding of the invention, which is defined only within the scope of the appended claims.

Настоящее изобретение заключается в совместном применении модуля, включающего механизм работы на основании по крайней мере одного метода машинного обучения, и инструмента стохастического моделирования, а именно, цепи Маркова. Целью настоящего изобретения является обнаружение аномалий в поведении доверенных процессов. На основании принципа цепи Маркова формируют базовые модели поведения каждого доверенного процесса, при этом каждому произошедшему событию определяют весовой коэффициент. Весовой коэффициент события указывает на вероятность появления этого события во время исполнения доверенного процесса. Во время исполнения доверенного процесса в операционной системе (ОС) наблюдают за появлением событий, которые затем сравнивают с событиями из созданной базовой модели поведения, и определяют для каждого события весовой коэффициент. При этом подсчитывается общая вероятность появления всей цепочки произошедших событий на основании весовых коэффициентов и формируется поведенческий лог из произошедших событий. Когда общая вероятность достигает заданного порога, сформированный поведенческий лог направляется на вход модулю, предназначенному для выполнения анализа с использованием алгоритма машинного обучения. На основании анализа указанным модулем сформированного поведенческого лога производится оценка наличия аномалии в поведении доверенного процесса. Таким образом, на выходе указанного модуля выдается решение об аномальном поведении доверенного процесса на основании ряда произошедших событий и/или отсутствий каких-либо событий в указанном ряду. Кроме того, на основании обнаруженной аномалии в дальнейшем может быть выявлена уязвимость в программном обеспечении, через которую эксплуатируется соответствующий доверенный процесс.The present invention consists in the joint application of a module including a mechanism based on at least one machine learning method and a stochastic modeling tool, namely, a Markov chain. The purpose of the present invention is to detect anomalies in the behavior of trusted processes. Based on the principle of the Markov chain, basic models of the behavior of each trusted process are formed, while each event that occurs is determined by a weighting factor. The event weight indicates the probability that the event will occur during the execution of the trusted process. During the execution of a trusted process, the operating system (OS) observes the occurrence of events, which are then compared with events from the created base behavior model, and a weight coefficient is determined for each event. At the same time, the overall probability of the occurrence of the entire chain of occurred events is calculated based on weight coefficients and a behavioral log is formed from the events that have occurred. When the overall probability reaches a given threshold, the generated behavioral log is sent to the input of a module designed to perform analysis using a machine learning algorithm. Based on the analysis of the generated behavioral log by the specified module, an assessment is made of the presence of an anomaly in the behavior of the trusted process. Thus, at the output of the specified module, a decision is made about the anomalous behavior of the trusted process based on a series of events that have occurred and/or the absence of any events in the specified series. In addition, based on the detected anomaly, a vulnerability in the software through which the corresponding trusted process is exploited can be further identified.

Поведенческий лог представляет собой упорядоченный набор записей о событиях, которые возникли в результате работы определенного доверенного процесса. Использование цепи Маркова при формировании базового поведенческого лога (базовой модели поведения) осуществляется как древовидное представление событий, происходящих во время исполнения в доверенном процессе, при этом для каждого события определяется вероятность его появления (весовой коэффициент).A behavioral log is an ordered set of records of events that have occurred as a result of the operation of a certain trusted process. The use of a Markov chain in the formation of a basic behavioral log (basic behavior model) is carried out as a tree representation of events occurring during execution in a trusted process, with the probability of its occurrence (weight coefficient) being determined for each event.

Доверенным процессом является процесс, запущенный из доверенного программного обеспечения (исполняемого файла), при этом исполняемый файл должен быть подписан как безопасный файл.A trusted process is a process launched from trusted software (an executable file), and the executable file must be signed as a safe file.

Далее представлены некоторые варианты воплощения заявленного изобретения.The following are some embodiments of the claimed invention.

На Фиг. 1 представлен схематический пример реализации системы 100 для осуществления способа обнаружения аномалий в поведении доверенных процессов на основании цепей Маркова и алгоритма машинного обучения.On FIG. 1 is a schematic example of an implementation of a system 100 for implementing a method for detecting anomalies in the behavior of trusted processes based on Markov chains and a machine learning algorithm.

Система 100 включает такие средства, как: модуль обучения 110, хранилище данных 120, средство контроля 130 и средство анализа 140. В свою очередь хранилище данных 120 содержит статистические данные о работе процессов 150, базовые модели поведения 160 и модели ML (англ. Machine Learning – машинное обучение) 170.The system 100 includes tools such as a learning module 110 , a data store 120 , a control tool 130 , and an analysis tool 140 . In turn, the data warehouse 120 contains statistical data on the operation of processes 150 , basic behavior models 160 and ML models (eng. Machine Learning - machine learning) 170 .

Для реализации заявленного изобретения требуется провести сбор информации о работе программного обеспечения (ПО) на компьютерной системе, при этом если заявленное изобретение используется на нескольких компьютерных системах, объединенных в сеть, например, в корпоративную сеть, такой сбор может быть произведен на всех компьютерных системах (КС) сети. На основании информации об используемом ПО определяется перечень процессов, исполнение (работа) которых на КС будет считаться доверенным. Далее при помощи модуля обучения 110 формируется базовая модель поведения 160 и модель ML 170 для каждого ПО и его доверенного процесса. Каждая базовая модель поведения 160 основана на принципе цепи Маркова и дополнительно содержит весовой коэффициент для каждого события в цепочке. Базовая модель поведения 160 формируется на основании собираемой информации о работе ПО, которая представлена в виде поведенческого лога. Модель ML 170 представляет собой модель машинного обучения.To implement the claimed invention, it is required to collect information about the operation of software (SW) on a computer system, and if the claimed invention is used on several computer systems connected in a network, for example, in a corporate network, such collection can be made on all computer systems ( CS) networks. Based on information about the software used, a list of processes is determined, the execution (work) of which at the CS will be considered trusted. Next, using the learning module 110, a basic behavior model 160 and an ML model 170 are formed for each software and its trusted process. Each base behavior model 160 is based on the principle of a Markov chain and additionally contains a weighting factor for each event in the chain. The base behavior model 160 is generated based on the collected information about the operation of the software, which is presented in the form of a behavioral log. Model ML 170 is a machine learning model.

Стоит отметить, что сбор информации в виде поведенческого лога и последующее формирование базовых моделей поведения 160 может быть реализовано заблаговременного и затем применяться на каждой КС при реализации заявленного изобретения. При этом базовые модели поведения 160 будут предоставлены вместе с решением или переданы через сеть Интернет 190, а также иметь возможность через сеть Интернет 190 обновляться. Такой вариант, в частности, предпочтителен для стандартного ПО. Заблаговременное формирование базовых моделей поведения 160 может быть реализовано на удаленном сервере (не представлен на Фиг. 1).It should be noted that the collection of information in the form of a behavioral log and the subsequent formation of basic behavioral models 160 can be implemented in advance and then applied to each CS when implementing the claimed invention. In this case, the basic behaviors 160 will be provided along with the solution or transmitted via the Internet 190 , and also be able to be updated via the Internet 190 . This option is particularly preferred for standard software. Early generation of baseline behaviors 160 may be implemented on a remote server (not shown in FIG. 1 ).

В другом варианте реализации базовые модели поведения 160 формируются непосредственно в начале работы, в режиме подготовки к работе. В этом режиме в реальном времени производится сбор информации о работе по крайней мере одного ПО во время его исполнения, формирование базовых моделей поведения 160 для этих процессов и формирование модели ML 170 для каждого ПО. Такой подход предпочтителен для не распространённого ПО либо при использовании в закрытой корпоративной сети.In another embodiment, the basic behaviors 160 are formed directly at the beginning of work, in the preparation mode for work. In this mode, information is collected in real time about the operation of at least one software during its execution, the formation of basic behavior models 160 for these processes and the formation of an ML model 170 for each software. This approach is preferable for non-distributed software or when used in a closed corporate network.

В зависимости от варианта реализации формирование базовой модели поведения 160 производится как на поведенческих логах, содержащих только доверенные события, так и на поведенческих логах, содержащих дополнительно аномальные события. В результате формирования базовой модели поведения 160 для аномальных событий устанавливаются низкие весовые коэффициенты. Таким образом, базовая модель поведения 160 содержит все теоретически возможные события, которые могут произойти в доверенном процессе, с разными весовыми коэффициентами для конкретного ПО.Depending on the implementation option, the formation of the basic behavior model 160 is performed both on behavioral logs containing only trusted events, and on behavioral logs containing additional anomalous events. As a result of the formation of the basic behavior model 160, low weighting factors are set for anomalous events. Thus, the base behavior model 160 contains all theoretically possible events that can occur in a trusted process, with different weighting factors for a particular software.

После формирования поведенческого лога для доверенного процесса определяется весовой коэффициент для каждого события из указанного лога. Весовой коэффициент для каждого события определяется с помощью сбора дополнительной аналогичной информации о происходящих событиях аналогичного ПО на других КС. Так, сбор осуществляется либо от КС в рамках одной корпоративной сети, либо от всех КС через сеть Интернет 190, либо от облачного сервера, обладающего информацией с необходимой статистикой о работе доверенных процессов, также через сеть Интернет 190.After generating a behavioral log for a trusted process, a weight coefficient is determined for each event from the specified log. The weight coefficient for each event is determined by collecting additional similar information about the events of similar software on other CSs. Thus, the collection is carried out either from the CS within one corporate network, or from all CSs via the Internet 190 , or from a cloud server that has information with the necessary statistics on the operation of trusted processes, also via the Internet 190 .

В одном из вариантов реализации весовой коэффициент для события формируется на основании вероятности появления события во всех поведенческих логах из собранной статистики. Другими словами, каждое событие цепочки содержит вероятность его появления, а ее расчет проводится как отношение количества появлений конкретного события для текущего процесса к общему количеству появлений подобных событий для подобного текущего процесса.In one implementation, a weighting factor for an event is generated based on the probability of occurrence of the event in all behavioral logs from the collected statistics. In other words, each event in the chain contains the probability of its occurrence, and its calculation is carried out as the ratio of the number of occurrences of a particular event for the current process to the total number of occurrences of similar events for a similar current process.

Как правило, доверенные процессы содержат доверенные события. Доверенным событием, как правило, является событие, которое не является частью вредоносной деятельности (таргетированной атаки) и характерно для поведения соответствующего программного обеспечения. Доверенные события обладают высокой вероятностью появления во время исполнения доверенного процесса и будут обладать высоким весовым коэффициентом. Если же по какой-то причине доверенные события являются редкими, то они будут обладать не высоким весовым коэффициентом, но при этом данный параметр, а именно редкость, будет учтен при анализе поведенческого лога с помощью модели ML 170 перед вынесением окончательного решения.Typically, trusted processes contain trusted events. A trusted event is generally an event that is not part of a malicious activity (targeted attack) and is characteristic of the behavior of the software in question. Trusted events have a high probability of occurring during the execution of a trusted process and will carry a high weight. If, for some reason, trusted events are rare, then they will not have a high weight coefficient, but this parameter, namely rarity, will be taken into account when analyzing the behavioral log using the ML 170 model before making a final decision.

Пример базовой модели поведения доверенного процесса в виде двудольного ориентированного графа (орграфа) представлен на Фиг. 2, где вершины – процессы (заштрихованный красный круг) и события (однотонный зеленый круг), дуги – переходы между процессами и событиями, а толщина дуг указывает на вероятность такого перехода, т.е. на вероятность возникновения процесса и события.An example of a basic behavior model of a trusted process in the form of a bipartite directed graph (digraph) is shown in Fig. 2 , where the vertices are processes (shaded red circle) and events (solid green circle), arcs are transitions between processes and events, and the thickness of the arcs indicates the probability of such a transition, i.e. on the probability of occurrence of the process and event.

Пример подхода формирования поведенческого лога (сбора информации) основан на следующем принципе.An example of the approach of forming a behavioral log (gathering information) is based on the following principle.

Процесс операционной системы (ОС) Windows состоит из набора потоков, при этом каждый процесс состоит хотя бы из одного потока. ПО стартует с точки входа и исполняется до его завершения, при этом продолжительность работы может быть не ограничена. В процессе работы поток взаимодействует только со своей оперативной памятью. Для изменения состояния ОС (например, при выполнении чтения или записи в файл, работе с сетью или другими процессами) поток вызывает сервис ядра ОС с помощью одной из команд: syscall, sysenter или int XXh, т.е. происходит системный вызов. Все сервисы ядра пронумерованы, адреса их функции-обработчиков содержатся в системной таблице SSDT. Как правило, используются функции из системных библиотек ОС, которые являются обертками над этими вызовами (ntdll.dll, kernel32.dll и другие).A Windows operating system (OS) process consists of a set of threads, with each process consisting of at least one thread. The software starts from the entry point and runs until it ends, while the duration of the work may not be limited. While running, a thread interacts only with its RAM. To change the state of the OS (for example, when reading or writing to a file, working with a network or other processes), a thread calls the OS kernel service using one of the commands: syscall, sysenter or int XXh, i.e. a system call occurs. All kernel services are numbered, the addresses of their handler functions are contained in the SSDT system table. As a rule, functions from the OS system libraries are used, which are wrappers for these calls (ntdll.dll, kernel32.dll, and others).

В одном из вариантов реализации сбор информации о процессе производится с помощью драйвера 180, который позволяет записывать номера системных вызовов для каждого потока и таким образом формировать поведенческий лог для процесса соответствующего ПО. При этом поведенческий лог содержит имена вызываемых функций, связанных с номерами системных вызовов, и параметры вызываемых функций.In one implementation, the process information is collected using the driver 180 , which allows you to record system call numbers for each thread and thus generate a behavioral log for the process of the corresponding software. In this case, the behavioral log contains the names of the called functions associated with the numbers of system calls, and the parameters of the called functions.

В еще одном варианте поведенческий лог для определенного ПО и его процесса может быть взят из заранее собранных статистических данных о работе процессов 150 и находящихся в хранилище данных 120. Статистические данные о работе процессов 150 включают по крайней мере номера системных вызовов, имена функций и параметры функций, которые были собраны во время наблюдения за исполнением доверенных процессов КС или получены от других КС, находящихся в корпоративной сети или связанных с помощью сети Интернет 190, или удаленного сервера, взаимодействующего через сеть Интернет 190.In yet another embodiment, the behavioral log for a particular software and its process can be taken from pre-collected statistical data about the operation of the processes 150 and stored in the data store 120 . Process statistics 150 include at least system call numbers, function names, and function parameters that were collected while monitoring the execution of trusted CS processes or received from other CSs located on the corporate network or connected via the Internet 190 , or remote server interacting via the Internet 190 .

В еще одном варианте реализации хранилище данных 120 может находиться на удаленной КС, с которой производится взаимодействие через сеть, в частности, через сеть Интернет 190.In yet another implementation, the data store 120 may be located on a remote CS, which is interacted with via a network, in particular, via the Internet 190 .

Модуль обучения 110 также предназначен для формирования модели машинного обучения (модель ML) 170. Модель ML для каждого ПО и его доверенного процесса формируется на базовых принципах формирования моделей машинного обучения для подобных задач, а именно для оценки вероятности наличия аномалии. Примерами моделей ML являются модели, созданные на одном или совместном использовании следующих принципов: наивном байесовском классификаторе (англ. naive Bayes classifiers), нейронных сетях (англ. artificial neural networks), дереве принятия решений (англ. decision tree), методе опорных векторов (англ. SVM, support vector machine).The learning module 110 is also designed to generate a machine learning model (ML model) 170 . The ML model for each software and its trusted process is formed on the basic principles of forming machine learning models for such tasks, namely, to assess the probability of an anomaly. Examples of ML models are models created on one or a combination of the following principles: naive Bayes classifier (eng. naive Bayes classifiers), neural networks (eng. artificial neural networks), decision tree (eng. decision tree), support vector machine ( English SVM, support vector machine).

Входными параметрами для обучения и дальнейшей работы каждой модели ML 170 являются параметры событий, происходящих во время работы доверенного процесса. В одном из вариантов реализации в качестве параметров используются текстовые данные (например, слова) из имен функций, которые встречались в происходящих событиях в доверенном процессе. Примером текстовых слов, которые соответствуют событиям, являются: CreateProcess, CreateFile и RegSetValue. В другом варианте реализации дополнительно в качестве параметров используются параметры функций. В еще одном варианте реализации модуль обучения 110 запрашивает модели ML 170 для необходимого ПО и соответствующих доверенных процессов у удаленного сервера, который содержит заранее сформированные модели ML 170 для различного ПО и соответствующих доверенных процессов, через сеть Интернет 190.The input parameters for training and further operation of each ML 170 model are the parameters of events that occur during the operation of the trusted process. In one implementation, textual data (eg, words) from function names that have been encountered in events occurring in the trusted process are used as parameters. Example text words that correspond to events are: CreateProcess, CreateFile, and RegSetValue. In another implementation, function parameters are additionally used as parameters. In yet another implementation, learning module 110 requests ML models 170 for required software and associated trusted processes from a remote server, which contains pre-generated ML models 170 for various software and associated trusted processes, over the Internet 190 .

Стоит отметить, что в зависимости от реализации изобретения модуль обучения 110 может быть реализован и на другой КС. В этом случае связь с ним будет осуществляться через сеть, например, сеть Интернет 190.It should be noted that, depending on the implementation of the invention, the learning module 110 can be implemented on another CS. In this case, communication with it will be carried out through a network, for example, the Internet 190 .

После подготовки всех необходимых базовых моделей поведения 160 и обучения моделей ML 170 для обнаружения аномалий в поведении доверенных процессов на КС и их добавления в хранилище данных 120 заявленное изобретение переходит из режима подготовки к работе в режим основной работы.After preparing all the necessary basic behavior models 160 and training the ML 170 models to detect anomalies in the behavior of trusted processes on the CS and add them to the data store 120, the claimed invention switches from the preparation mode to the main operation mode.

В режиме основной работы при запуске нового доверенного процесса, например, процесса 105 с помощью средства контроля 130 производится его обнаружение. Затем средство контроля 130, обращаясь (делая запрос) к хранилищу данных 120, выбирает базовую модель поведения 160, которая соответствует обнаруженному доверенному процессу 105. После чего средство контроля 130 осуществляет контроль доверенного процесса 105. Во время контроля средство контроля 130 производит сравнение происходящих событий с событиями из базовой модели поведения 160, на основании которого в режиме реального времени подсчитывает общую вероятность появления происходящих событий. Подсчет осуществляется на основании определения вероятности происхождения (весового коэффициента) каждого события в цепочке событий из базовой модели поведения 160 и дальнейшего их перемножения между собой. При достижении заданного порога из пройденной части цепочки базовой модели поведения 160 извлекают данные и отправляют их средству анализа 140. Извлекаемыми данными является поведенческий лог, содержащий только произошедшие события, общая вероятность появления которых преодолела заданный порог, и параметры указанных событий. Например, в качестве параметров будут использованы текстовые слова, которые встретились в атрибутах произошедших событий. In main mode, when a new trusted process, such as process 105, is started, it is detected by monitoring tool 130 . The control 130 then, by querying the data store 120, selects a base behavior 160 that corresponds to the detected trusted process 105 . The control 130 then controls the trusted process 105 . During the monitoring, the monitoring tool 130 compares the events that occur with the events from the base behavior 160 , based on which, in real time, calculates the overall probability of occurrence of the events. The calculation is carried out on the basis of determining the probability of occurrence (weight coefficient) of each event in the chain of events from the basic behavior model 160 and their further multiplication with each other. When a predetermined threshold is reached, data is extracted from the passed part of the chain of the basic behavior model 160 and sent to the analysis tool 140 . The retrieved data is a behavioral log that contains only events that have occurred, the total probability of occurrence of which has overcome a given threshold, and the parameters of the specified events. For example, text words that were found in the attributes of occurred events will be used as parameters.

Средство анализа 140 обращается (делает запрос) к хранилищу данных 120 для получения модели ML 170, которая соответствует анализируемому ПО. Далее средство анализа 140 передает полученные параметры от средства контроля 130 на вход модели МL 170. Средство анализа 140 с помощью модели ML 170 проводит анализ параметров. В одном из вариантов реализации модель ML разобьёт текстовые данные на слова, по которым и произведет анализ. По результатам анализа выносится решение о наличии аномального поведения в доверенном процессе. Так, например, решение будет говорить о наличии аномального поведения доверенного процесса, если в образовавшейся цепочке переходов между процессами и событиями вида: winword -> CreateProcess -> wmic -> CreateProcess, будут присутствовать строки: powershell и encodedcommand. Примером, когда решение будет говорить об отсутствии аномального поведения, будет являться сформированная цепочка вида: excel -> CreateProcess со строкой calc.exe.The parser 140 accesses (queries) the data store 120 to obtain an ML 170 model that corresponds to the analyzed software. Next, the analysis tool 140 transmits the received parameters from the control tool 130 to the input of the model ML 170 . The analysis tool 140 uses the ML model 170 to analyze the parameters. In one of the implementation options, the ML model will break the text data into words, according to which it will perform the analysis. Based on the results of the analysis, a decision is made about the presence of anomalous behavior in the trusted process. So, for example, the decision will indicate the presence of anomalous behavior of a trusted process, if in the resulting chain of transitions between processes and events of the form: winword -> CreateProcess -> wmic -> CreateProcess, there will be lines: powershell and encodedcommand. An example, when the decision will talk about the absence of anomalous behavior, will be the formed chain of the form: excel -> CreateProcess with the line calc.exe.

Стоит отметить, что в зависимости от реализации заявленного изобретения решение о наличие аномального поведения, вынесенное средством анализа 140, может быть как окончательным, так и промежуточным. В случае, когда решение является промежуточным, окончательным решением будет являться решение, объединяющее решение от средства анализа 140 и полученных от средства контроля 130 результатов, включающих информацию о превышенном пороге и поведенческом логе. В одном из вариантов реализации окончательным решением является среднее значение от вероятности для цепочки и решения модели ML.It should be noted that, depending on the implementation of the claimed invention, the decision on the presence of anomalous behavior made by the analysis tool 140 can be either final or intermediate. In the case where the decision is an intermediate decision, the final decision will be the decision combining the decision from the analyzer 140 and the results received from the monitor 130 , including information about the threshold exceeded and the behavioral log. In one implementation, the final solution is the average of the chain probability and the ML model solution.

Пример обнаружения аномалий в поведении доверенного процесса на примере исполняемого файла «winword.exe» программного обеспечения Microsoft Word.An example of detecting anomalies in the behavior of a trusted process on the example of the executable file "winword.exe" of Microsoft Word software.

Допустим, что для исполняемого файла «winword.exe» создана базовая модель поведения, выраженная в виде цепи Маркова, и обученная для нее модель ML.Suppose that for the executable file "winword.exe" a basic behavior model has been created, expressed as a Markov chain, and an ML model has been trained for it.

Также предположим, что базовая модель поведения для доверенного процесса исполняемого файла «winword.exe» включает следующие события:Also assume that the base behavior for the trusted process of the executable "winword.exe" includes the following events:

winword.exe выполнить:winword.exe run:

Запуск приложения (Вероятность появления или весовой коэффициент: 0.9);Launching the application (Probability of occurrence or weighting factor: 0.9);

По ходу появления событий в доверенном процессе система 100 с помощью средства контроля 130 сравнивает их с событиями в указанной выше цепочке базовой модели поведения, продвигается по ней и считает общую вероятность появления произошедших событий. При достижении заданного порога из пройденной части цепочки средство контроля 130 извлекает данные, которые отправляет средству анализа 140, а именно на вход модели ML для последующего анализа. По результатам указанного анализа средство анализа 140 выносит решение о наличии аномального поведения в доверенном процессе.As events occur in the trusted process, the system 100 compares them with the events in the above base behavior chain, using the control 130 , moves along it and calculates the overall probability of occurrence of the events that occurred. When a predetermined threshold is reached from the passed part of the chain, the control tool 130 extracts data that it sends to the analysis tool 140 , namely, to the input of the ML model for further analysis. Based on the results of this analysis, the analysis engine 140 makes a decision about the presence of anomalous behavior in the trusted process.

В одном из вариантов работы доверенного процесса пройденная часть цепочки может содержать следующие события: In one of the options for the operation of a trusted process, the traversed part of the chain may contain the following events:

1. winword.exe (Вероятность: 1.0, так как является корневым процессом).1. winword.exe (Probability: 1.0 as it is the root process).

2. Запуск приложения (Вероятность: 0.9).2. Application launch (Probability: 0.9).

3. cmd.exe (Вероятность: 0.8).3. cmd.exe (Probability: 0.8).

4. Запуск приложения (Вероятность: 0.8).4. Application launch (Probability: 0.8).

5. svchost.exe (Вероятность: 0.2).5. svchost.exe (Probability: 0.2).

Следовательно, выход цепочки (общая вероятность появления произошедших событий) будет соответствовать: 1.0 х 0.9 х 0.8 х 0.8 х 0.2 = 0.11520000000000002. Therefore, the output of the chain (the total probability of occurrence of events that have occurred) will correspond to: 1.0 x 0.9 x 0.8 x 0.8 x 0.2 = 0.11520000000000002.

Если заданный порог был равен 0.2, то общая вероятность его перешла. Соответственно, средство контроля 130 вынесет предварительное решение о наличии аномалии. Далее извлекаются данные и передаются средству анализа 140 на вход модели ML. Допустим, в качестве данных используются слова из атрибутов системных вызовов (например, имен функций и событий) из цепочки. Средство анализа 140 с помощью модели ML проводит анализ полученных данных по словам, на основании которых выносит окончательное решение об аномальном поведении.If the given threshold was equal to 0.2, then the total probability passed it. Accordingly, the control means 130 will make a preliminary decision about the presence of an anomaly. Next, the data is extracted and passed to the analyzer 140 as input to the ML model. Let's say the data is words from the attributes of system calls (for example, names of functions and events) from the chain. The analyzer 140 uses the ML model to analyze the received data by words, on the basis of which it makes a final decision about anomalous behavior.

В частном случае реализации система 100 при вынесении решения о наличии аномалии в поведении доверенного процесса дополнительно передает информацию, включающую вынесенное решение и информацию о ПО и его доверенном процессе, по крайней мере одному внешнему компоненту защиты КС (не представлены на Фиг. 1). Внешними компонентами защиты могут являться компоненты КС или компоненты защиты (например, антивирус), обладающие следующими механизмами: предотвращения заражения, отправления нотификации пользователю или средству управления корпоративной сетью, блокирования работы процесса, проведения проверки упомянутого решения на ложное срабатывание.In a particular implementation case, system 100 , when making a decision about the presence of an anomaly in the behavior of a trusted process, additionally transmits information, including the decision made and information about the software and its trusted process, to at least one external CS security component (not shown in Fig. 1 ). External protection components can be CS components or protection components (for example, antivirus) that have the following mechanisms: preventing infection, sending notifications to the user or corporate network management tool, blocking the process, checking the mentioned solution for false positives.

В еще одном частном случае реализации система 100 в случае передачи информации компоненту КС, отвечающему за проверку ложного срабатывания, ожидает ответа и не передает информацию другим компонентам КС. Ответ содержит либо информацию об отсутствии ложного срабатывания, либо о его наличии совместно с информацией о допущенной ошибке, приведшей к ошибочному решению. В случае, когда ложное срабатывание не подтвердилось, решение остается в силе, и система 100 передает информацию другим компонентам КС для обеспечения защиты данных и ОС. В противном случае, когда ложное срабатывание подтвердилось, система 100 отменяет решение о выявленной аномалии в поведении доверенного процесса и с помощью модуля обучения 110 проводит переобучение соответствующей модели ML и/или выполняет изменение в базовой модели поведения на основании полученной информации от указанного компонента КС. Компонент КС, отвечающий за проверку ложного срабатывания, при его реализации может быть как техническим решением, позволяющим самостоятельно провести проверку, так и техническим решением, взаимодействующим с пользователем (администратором) через элементы вывода и ввода. Техническое решение, проводящее самостоятельную проверку ложного срабатывания, основано на сборе данных о проверяемом вынесенном решении от других аналогичных решений и при анализе использует различные правила проверки.In another particular case of implementation, the system 100 , in the case of transmitting information to the CS component responsible for checking false positives, waits for a response and does not transmit information to other components of the CS. The response contains either information about the absence of a false positive, or about its presence, together with information about the error that led to the erroneous decision. In the case where the false positive is not confirmed, the decision remains in effect and the system 100 passes the information to other components of the CS to ensure data and OS protection. Otherwise, when the false positive is confirmed, the system 100 cancels the decision about the identified anomaly in the behavior of the trusted process and, using the learning module 110, retrains the corresponding ML model and/or performs a change in the basic behavior model based on the information received from the specified CS component. The CS component responsible for checking false positives, when implemented, can be both a technical solution that allows you to independently conduct a check, and a technical solution that interacts with the user (administrator) through the output and input elements. The technical solution that conducts an independent check of false positives is based on the collection of data on the verified decision made from other similar solutions and uses different verification rules in the analysis.

На Фиг. 3 представлена блок-схема, иллюстрирующая способ обнаружения аномалий в поведении доверенных процессов на основании цепей Маркова и алгоритма машинного обучения. Представленный способ реализуется с помощью средств системы, представленных при описании Фиг. 1.OnFig. 3presented a flowchart illustrating a method for detecting anomalies in the behavior of trusted processes based on Markov chains and a machine learning algorithm. The presented method is implemented using the system tools presented in the descriptionFig. 1.

На этапе 310 (предварительном) с помощью модуля обучения 110 проводят подготовительную работу, включающую создание базовой модели поведения и модели ML для по меньшей мере одного доверенного процесса. Указанная подготовительная работа представлена при описании Фиг. 1.At step 310 (preliminary), training module 110 performs preparatory work, including the creation of a basic behavior model and an ML model for at least one trusted process. Said preparatory work is shown in the description of FIG. 1 .

На этапе 320 с помощью средства контроля 130 обнаруживают запуск доверенного процесса.At block 320 , the monitor 130 detects that a trusted process is running.

На этапе 330 с помощью средства контроля 130 выбирают базовую модель поведения для обнаруженного доверенного процесса, где указанная модель выражена в виде цепи Маркова.At step 330 , the base behavior model for the detected trusted process is selected using the control 130 , where the specified model is expressed as a Markov chain.

На этапе 340 с помощью средства контроля 130 контролируют работу обнаруженного доверенного процесса, во время которого производят подсчет общей вероятности появления происходящих событий в доверенном процессе на основании базовой модели поведения.At step 340 , the monitoring tool 130 monitors the operation of the detected trusted process, during which the total probability of occurrence of events occurring in the trusted process is calculated based on the basic behavior model.

На этапе 350 с помощью средства контроля 130 определяют, превышает ли общая вероятность заданный порог.At step 350 , it is determined by means of the control 130 whether the overall probability exceeds a predetermined threshold.

На этапе 360 с помощью средства контроля 130 в том случае, если общая вероятность достигла или превысила заданный порог, извлекают данные из возникшей цепочки произошедших событий и передают средству анализа 140, а именно на вход модели машинного обучения (модели ML).At step 360 , with the help of the control tool 130 , in the event that the overall probability has reached or exceeded a predetermined threshold, data is extracted from the resulting chain of events that have occurred and transferred to the analysis tool 140 , namely, to the input of a machine learning model (ML model).

Стоит отметить, что в зависимости от реализации средство анализа 140 может включать в свой состав необходимую модель ML. В случае, когда средство анализа 140 не включает необходимую модель ML, средство анализа 140 предварительно запрашивает из хранилища данных 120 модель ML, соответствующую упомянутому доверенному процессу, и после этого передает данные ей на вход.It is worth noting that, depending on the implementation, the parser 140 may include the required ML model. In the event that the parser 140 does not include the required ML model, the parser 140 first queries the data store 120 for the ML model corresponding to said trusted process and then passes the data to it as input.

На этапе 370 с помощью средство анализа 140 выносят решение о наличии аномального поведения в доверенном процессе на основании анализа данных при помощи модели ML.At step 370 , the analyzer 140 makes a decision about the presence of anomalous behavior in the trusted process based on data analysis using the ML model.

На Фиг. 4 представлена блок-схема, иллюстрирующая пример формирования базовой модели поведения и модели машинного обучения (Модель ML) для доверенного процесса. Представленный пример реализуется с помощью средств системы, представленных при описании Фиг. 1.OnFig. 4 presented a flowchart illustrating an example of generating a basic behavior model and a machine learning model (ML Model) for a trusted process. The presented example is implemented using the system tools presented in the descriptionFig. 1.

На этапе 410 с помощью драйвера 180 производится сбор информации о событиях, происходящих в доверенном процессе, во время исполнения ПО. Собранная информация представляет собой поведенческий лог, который передают модулю обучения 110.At block 410 , the driver 180 collects information about events occurring in the trusted process during the execution of the software. The collected information is a behavioral log that is passed to the learning module 110 .

На этапе 420 с помощью модуля обучения 110 анализируют поведенческий лог с целью распознавания произошедших событий и выявления параметров событий, произошедших во время работы доверенного процесса. В одном из вариантов реализации в качестве параметров определяют текстовые слова, которые находятся в атрибутах событий.At step 420 , the behavioral log is analyzed using the learning module 110 to recognize the events that have occurred and to identify the parameters of the events that have occurred during the operation of the trusted process. In one embodiment, the text words that are in the event attributes are defined as parameters.

На этапе 430 с помощью модуля обучения 110 дополнительно запрашивают статистические данные о работе подобного доверенного процесса. Запрос направляется к хранилищу данных и/или к удаленному серверу.At block 430 , learning module 110 additionally requests statistical data about the operation of such a trusted process. The request is sent to the data store and/or to a remote server.

На этапе 440 с помощью модуля обучения 110 определяют весовой коэффициент для каждого события из полученного поведенческого лога. В одном из вариантов реализации весовой коэффициент для события формируется на основании вероятности появления события во всех поведенческих логах из собранной статистики. Каждое событие цепочки является вероятностью, ее расчет осуществляется как отношение количества появлений конкретного события для текущего процесса к общему количеству появлений подобных событий для текущего процесса.At step 440 , the training module 110 determines a weight for each event from the received behavioral log. In one implementation, a weighting factor for an event is generated based on the probability of occurrence of the event in all behavioral logs from the collected statistics. Each event in the chain is a probability, it is calculated as the ratio of the number of occurrences of a particular event for the current process to the total number of occurrences of similar events for the current process.

На этапе 450 с помощью модуля обучения 110 формируют базовую модель поведения согласно принципу цепи Маркова, которая представляет собой древовидную последовательность событий, происходящих во время исполнения ПО в доверенном процессе, при этом для каждого события указывается вероятность его появления (весовой коэффициент).At step 450 , using the learning module 110 , a basic behavior model is formed according to the Markov chain principle, which is a tree-like sequence of events occurring during the execution of software in a trusted process, with each event indicating the probability of its occurrence (weight coefficient).

На этапе 460 с помощью модуля обучения 110 обучают модель ML на основании выявленных параметров событий.At 460 , the training module 110 trains the ML model based on the detected event parameters.

На этапе 470 с помощью модуля обучения 110 добавляют базовую модель поведения и модель ML в хранилище данных 120.At step 470 , the base behavior model and the ML model are added to the data store 120 using the learning module 110 .

Фиг. 5 представляет пример компьютерной системы 20 общего назначения, которая может быть использована как компьютер клиента (например, персональный компьютер) или сервер, представленные на Фиг. 1. Компьютерная система 20 содержит центральный процессор 21, системную память 22 и системную шину 23, которая содержит разные системные компоненты, в том числе память, связанную с центральным процессором 21. Системная шина 23 реализована, как любая известная из уровня техники шинная структура, содержащая в свою очередь память шины или контроллер памяти шины, периферийную шину и локальную шину, которая способна взаимодействовать с любой другой шинной архитектурой. Системная память содержит постоянное запоминающее устройство (ПЗУ) 24, память с произвольным доступом (ОЗУ) 25. Основная система ввода/вывода (BIOS) 26, содержит основные процедуры, которые обеспечивают передачу информации между элементами компьютерной системы 20, например, в момент загрузки операционной системы с использованием ПЗУ 24. Fig. 5 represents an example of a general purpose computer system 20 that can be used as a client computer (eg, a personal computer) or a server shown in FIG. 1 . Computer system 20 includes a central processing unit 21 , system memory 22 , and a system bus 23 , which contains various system components, including memory associated with the central processing unit 21 . The system bus 23 is implemented as any bus structure known in the art, in turn comprising a bus memory or bus memory controller, a peripheral bus, and a local bus capable of interfacing with any other bus architecture. The system memory contains read-only memory (ROM) 24 , random access memory (RAM) 25 . The main input/output system (BIOS) 26 contains the basic procedures that ensure the transfer of information between the elements of the computer system 20 , for example, at the time of booting the operating system using ROM 24 .

Компьютерная система 20 в свою очередь содержит жесткий диск 27 для чтения и записи данных, привод магнитных дисков 28 для чтения и записи на сменные магнитные диски 29 и оптический привод 30 для чтения и записи на сменные оптические диски 31, такие как CD-ROM, DVD-ROM и иные оптические носители информации. Жесткий диск 27, привод магнитных дисков 28, оптический привод 30 соединены с системной шиной 23 через интерфейс жесткого диска 32, интерфейс магнитных дисков 33 и интерфейс оптического привода 34 соответственно. Приводы и соответствующие компьютерные носители информации представляют собой энергонезависимые средства хранения компьютерных инструкций, структур данных, программных модулей и прочих данных компьютерной системы 20. The computer system 20 in turn comprises a hard disk 27 for reading and writing data, a magnetic disk drive 28 for reading and writing to removable magnetic disks 29 and an optical drive 30 for reading and writing to removable optical disks 31 such as CD-ROM, DVD -ROM and other optical storage media. The hard disk 27 , the magnetic disk drive 28 , the optical drive 30 are connected to the system bus 23 via the hard disk interface 32 , the magnetic disk interface 33 , and the optical drive interface 34 , respectively. Drives and related computer storage media are non-volatile means of storing computer instructions, data structures, program modules, and other computer system data 20 .

Настоящее описание раскрывает реализацию системы, которая использует жесткий диск 27, сменный магнитный диск 29 и сменный оптический диск 31, но следует понимать, что возможно применение иных типов компьютерных носителей информации 56, которые способны хранить данные в доступной для чтения компьютером форме (твердотельные накопители, флеш карты памяти, цифровые диски, память с произвольным доступом (ОЗУ) и т.п.), которые подключены к системной шине 23 через контроллер 55.The present description discloses an implementation of a system that uses a hard disk 27' , a removable magnetic disk 29' , and a removable optical disk 31' , but it should be understood that other types of computer storage media 56 that are capable of storing data in a computer-readable form (solid state drives, flash memory cards, digital disks, random access memory (RAM), etc.), which are connected to the system bus 23 through the controller 55.

Компьютер 20 имеет файловую систему 36, где хранится записанная операционная система 35, а также дополнительные программные приложения 37, другие программные модули 38 и данные программ 39. Пользователь имеет возможность вводить команды и информацию в персональный компьютер 20 посредством устройств ввода (клавиатуры 40, манипулятора «мышь» 42). Могут использоваться другие устройства ввода (не отображены): микрофон, джойстик, игровая консоль, сканнер и т.п. Подобные устройства ввода по своему обычаю подключают к компьютерной системе 20 через последовательный порт 46, который в свою очередь подсоединен к системной шине, но могут быть подключены иным способом, например, при помощи параллельного порта, игрового порта или универсальной последовательной шины (USB). Монитор 47 или иной тип устройства отображения также подсоединен к системной шине 23 через интерфейс, такой как видеоадаптер 48. В дополнение к монитору 47, персональный компьютер может быть оснащен другими периферийными устройствами вывода (не отображены), например, колонками, принтером и т.п.The computer 20 has a file system 36 where the recorded operating system 35 is stored, as well as additional software applications 37, other program modules 38 and program data 39 . The user has the ability to enter commands and information into the personal computer 20 through input devices (keyboard 40 , mouse 42 ). Other input devices (not shown) may be used: microphone, joystick, game console, scanner, etc. Such input devices are typically connected to the computer system 20 through a serial port 46 , which in turn is connected to the system bus, but may be connected in other ways, such as through a parallel port, game port, or universal serial bus (USB). A monitor 47 or other type of display device is also connected to the system bus 23 via an interface such as a video adapter 48' . In addition to the monitor 47 , the personal computer may be equipped with other peripheral output devices (not shown), such as speakers, a printer, and the like.

Компьютерная система 20 способна работать в сетевом окружении, при этом используется сетевое соединение с другим или несколькими удаленными компьютерами 49. Удаленный компьютер (или компьютеры) 49 являются такими же персональными компьютерами или серверами, которые имеют большинство или все упомянутые элементы, отмеченные ранее при описании существа компьютерной системы 20, представленного на Фиг. 5. В вычислительной сети могут присутствовать также и другие устройства, например, маршрутизаторы, сетевые станции, пиринговые устройства или иные сетевые узлы. The computer system 20 is capable of operating in a networked environment using a network connection to another or more remote computers 49 . The remote computer (or computers) 49 are the same personal computers or servers that have most or all of the elements mentioned earlier in the description of the nature of the computer system 20 shown in FIG. 5 . Other devices may also be present in the computer network, such as routers, network stations, peer-to-peer devices, or other network nodes.

Сетевые соединения могут образовывать локальную вычислительную сеть (LAN) 50 и глобальную вычислительную сеть (WAN). Такие сети применяются в корпоративных компьютерных сетях, внутренних сетях компаний и, как правило, имеют доступ к сети Интернет. В LAN- или WAN-сетях компьютерная система (персональный компьютер) 20 подключена к локальной сети 50 через сетевой адаптер или сетевой интерфейс 51. При использовании сетей персональный компьютер 20 может использовать модем 54 или иные средства обеспечения связи с глобальной вычислительной сетью, такой как Интернет. Модем 54, который является внутренним или внешним устройством, подключен к системной шине 23 посредством последовательного порта 46. Следует уточнить, что сетевые соединения являются лишь примерными и не обязаны отображать точную конфигурацию сети, т.е. в действительности существуют иные способы установления соединения техническими средствами связи одного компьютера с другим.The network connections may form a local area network (LAN) 50 and a wide area network (WAN). Such networks are used in corporate computer networks, internal networks of companies and, as a rule, have access to the Internet. In LAN or WAN networks, the computer system (personal computer) 20 is connected to the local network 50 via a network adapter or network interface 51 . When using networks, personal computer 20 may use a modem 54 or other means to communicate with a wide area network, such as the Internet. The modem 54 , which is an internal or external device, is connected to the system bus 23 via the serial port 46 . It should be clarified that network connections are only indicative and are not required to represent the exact network configuration, i.e. in fact, there are other ways to establish a connection by technical means of communication from one computer to another.

В заключение следует отметить, что приведенные в описании сведения являются примерами, которые не ограничивают объем настоящего изобретения, определенного формулой. Специалисту в данной области становится понятным, что могут существовать и другие варианты осуществления настоящего изобретения, согласующиеся с сущностью и объемом настоящего изобретения.In conclusion, it should be noted that the information given in the description are examples that do not limit the scope of the present invention defined by the formula. A person skilled in the art will appreciate that there may be other embodiments of the present invention consistent with the spirit and scope of the present invention.

Claims

1. A method for detecting an anomaly in the behavior of a trusted process, according to which:

A. detecting a trusted process running on the computer system;

b. querying the data store for a basic behavior model expressed as a Markov chain and a machine learning model (ML Model) for the file from which the trusted process is launched;

V. collect data about the trusted process using the basic behavior model, during which:

i. track ongoing events according to the Markov chain from the basic behavior model,

ii. determine the probability of occurrence of each event in the trusted process by applying the Markov chain and

iii. calculate the overall probability of occurrence of the events that have occurred;

d. comparing the overall probability of occurrence of occurred events with a given threshold;

e. when a predetermined threshold is exceeded, the events that occurred while following the Markov chain are selected and the data about the events are transmitted to the input of the ML Model, while the data includes the chain of events that have occurred;

e. analyze the event data using the ML Model, as a result of the analysis, make a decision about anomalous behavior.

2. The method according to claim 1, in which the basic behavior model, expressed in the form of a Markov chain, contains chains of events that occur during the execution of a trusted process, with each event having its own probability of occurrence.

3. The method according to claim 2, in which the events are trusted, while trusted events have a high probability of occurrence in the chain, where the probability is calculated as the ratio of the number of occurrences of a particular event for the current process to the total number of occurrences of similar events for the current process.

4. The method according to claim 1, wherein the ML Model is pre-trained on the occurrence of text words that occurred in event attributes in the chain.

5. The method according to claim 1, in which the specified ML model was trained in advance both on logs without anomalies and with them, where as a result of training, anomalous events are determined by low coefficients.

6. The method according to claim 1, in which, when anomalous behavior is detected, the trusted process is blocked and the user is notified.

7. The method according to claim. 1, in which when making a decision on anomalous behavior, a check for false positives is carried out by transmitting data to the component responsible for the specified check about the decision made and receiving a response with the results of the check.

8. The method according to claim 7, in which the result of checking a false positive is obtained, while if an error is detected in the decision about the detected anomaly, then the ML Model is retrained by retraining together with the data in the analysis of which the error occurred.

9. The method of claim. 1, in which the chain of occurred events includes events from the initial event to the event after which the specified threshold was reached.

10. A system containing at least one computer, including means interacting with each other: a learning module, a control tool, an analysis tool, a data store containing statistical data on the operation of processes, basic behavior models and ML models, and stored machine-readable instructions, when executed which the system performs anomaly detection in the behavior of the trusted process according to the method according to any one of paragraphs. 1–9.