IT201800005412A1

IT201800005412A1 - System and method for the creation and verification of behavioral baselines.

Info

Publication number: IT201800005412A1
Application number: IT102018000005412A
Authority: IT
Priority date: 2018-05-16
Filing date: 2018-05-16
Publication date: 2019-11-16
Also published as: WO2019220363A1; EP3794481A1

Description

"Sistema e metodo per la creazione e la verifica di baseline comportamentali" "System and method for the creation and verification of behavioral baselines"

D E S C R I Z I O N E DESCRIPTION

Il presente trovato ha come oggetto un sistema e un metodo per la creazione e la verifica di baseline comportamentali, particolarmente, seppur non esclusivamente, utile e pratico nell'ambito della sicurezza informatica aziendale o infrastrutturale . The present invention relates to a system and a method for creating and verifying behavioral baselines, particularly, though not exclusively, useful and practical in the field of corporate or infrastructural IT security.

Nella presente descrizione, con il termine "utente" si indica una persona fisica in carne ed ossa, che utilizza uno o più sistemi informatici compresi in un sistema informativo aziendale più meno o complesso e che è titolare di uno o più account, mentre con il termine "account" si indica un'entità registrata e autorizzata all'accesso in uno dei sistemi informatici, tipicamente definita da username e password. In this description, the term "user" indicates a physical person in flesh and blood, who uses one or more computer systems included in a more less or complex corporate information system and who owns one or more accounts, while with the the term "account" indicates an entity registered and authorized to access one of the computer systems, typically defined by username and password.

Nella presente descrizione, con l'espressione "baseline comportamentale" si indica il modello o profilo comportamentale di almeno un'entità (di seguito considereremo gli utenti come entità da osservare, anche se il presente trovato è generalizzabile a considerare come entità anche qualsiasi apparato generatore di emissioni in senso lato), costituito dall'insieme di azioni e operazioni che quello stesso utente o apparato ha compiuto e dunque compirebbe in una situazione di normalità (ovvero in una situazione di "base"). In the present description, the expression "behavioral baseline" indicates the behavioral model or profile of at least one entity (hereinafter we will consider users as entities to be observed, even if the present invention can be generalized to consider any generator apparatus as an entity emissions in a broad sense), consisting of the set of actions and operations that the same user or device has performed and therefore would perform in a normal situation (ie in a "basic" situation).

Oggigiorno, i sistemi e le reti aziendali sono particolarmente difficili da proteggere da minacce informatiche a causa di una serie di fattori, tra cui ad esempio: la capacità degli aggressori di operare da qualsiasi parte del mondo, i collegamenti tra Internet e i sistemi aziendali, e la difficoltà di ridurre le vulnerabilità in reti informatiche complesse. Today, corporate systems and networks are particularly difficult to protect from cyber threats due to a number of factors, including: the ability of attackers to operate from anywhere in the world, the links between the Internet and corporate systems, and the difficulty of reducing vulnerabilities in complex computer networks.

Inoltre, le minacce informatiche alle infrastrutture critiche (come ad esempio centrali elettriche, aeroporti, ospedali, e così via) sono sempre più preoccupanti, in quanto tali infrastrutture sono soggette ad un rischio crescente a causa di tentativi di intrusione sempre più sofisticati. Poiché l'Information Technology (in sigla IT) si integra sempre di più con l'operatività delle infrastrutture critiche, di conseguenza aumenta il rischio di eventi su vasta scala o ad alto impatto che potrebbero causare danni o interrompere servizi da cui dipendono l'economia globale e la vita quotidiana di milioni di persone. Furthermore, cyber threats to critical infrastructures (such as power plants, airports, hospitals, and so on) are of increasing concern, as such infrastructures are subject to increasing risk due to increasingly sophisticated intrusion attempts. As Information Technology (acronym IT) integrates more and more with the operation of critical infrastructures, consequently the risk of large-scale or high-impact events that could cause damage or disrupt services on which the economy depends. global and the daily lives of millions of people.

Alla luce dell'elevato rischio e delle potenziali conseguenze degli attacchi informatici, il rafforzamento della sicurezza e della resilienza in ambito informatico è diventato un'importante missione di sicurezza per tutte le aziende e i governi. In light of the high risk and potential consequences of cyber attacks, strengthening cyber security and resilience has become an important security mission for all businesses and governments.

Per quanto riguarda le aziende, la rapida individuazione di minacce alla sicurezza informatica è fondamentale per prevenire la compromissione dei loro sistemi informatici e il furto dei dati. Molti di questi dati sono importantissime informazioni commerciali e/o personali, oppure addirittura segreti industriali, e non sono destinati alla divulgazione o alla visione pubblica e qualsiasi esposizione, furto o manipolazione di questi dati potrebbe causare danni economici o reputazionali all'organizzazione e agli individui stessi. Un gran numero di questi attacchi informatici, come riportato dai media, hanno coinvolto frodi, violazioni dei dati, proprietà intellettuale o sicurezza nazionale. As for businesses, the rapid detection of cybersecurity threats is critical to preventing compromise of their IT systems and data theft. Many of these data are very important commercial and / or personal information, or even industrial secrets, and are not intended for disclosure or public viewing and any exposure, theft or manipulation of this data could cause economic or reputational damage to the organization and individuals. themselves. A large number of these cyber attacks, as reported by the media, have involved fraud, data breaches, intellectual property or national security.

Le aziende utilizzano tipicamente una topologia di rete stratificata e a compartimenti stagni per separare la rete interna da Internet. Le workstation e i server sono generalmente protetti dall'accesso diretto via Internet o altre reti esterne da un server proxy; il traffico Internet viene in genere terminato in cosiddette "zone demilitarizzate" della rete aziendale e il traffico in arrivo è filtrato attraverso uno o più firewall . Enterprises typically use a layered, compartmentalized network topology to separate the internal network from the Internet. Workstations and servers are generally protected from direct access via the Internet or other external networks by a proxy server; Internet traffic is typically terminated in so-called "demilitarized zones" of the corporate network and incoming traffic is filtered through one or more firewalls.

Normalmente, gli aggressori concentrano i loro attacchi sugli elementi esposti verso il perimetro esterno della rete aziendale ed esistono molte soluzioni che garantiscono la sicurezza perimetrale. Tuttavia i confini delle moderne reti aziendali non sono più così ben definiti; l'aumento dell'uso delle applicazioni cloud e l'utilizzo di dispositivi mobili hanno reso la rete aziendale più fluida e dinamica, con confini difficili da individuare. Normally, attackers focus their attacks on elements exposed to the external perimeter of the corporate network, and there are many solutions that ensure perimeter security. However, the boundaries of modern corporate networks are no longer so well defined; the increase in the use of cloud applications and the use of mobile devices have made the corporate network more fluid and dynamic, with boundaries difficult to identify.

Una volta che gli aggressori hanno violato il perimetro e sono entrati nella rete aziendale, in genere operano sotto l'aspetto di un utente interno, rubando l'account di un utente esistente o creandone uno nuovo. Utilizzano account legittimi o sistemi trusted, e possono muoversi liberamente nella rete aziendale sfruttando la mancanza di un efficace presidio della rete interna . Once attackers have breached the perimeter and entered the corporate network, they typically operate under the guise of an internal user, stealing an existing user's account or creating a new one. They use legitimate accounts or trusted systems, and can move freely in the corporate network by exploiting the lack of effective supervision of the internal network.

Attualmente, sono note svariate soluzioni di sicurezza che hanno la capacità di rilevare potenziali attività malevole da parte di un utente. La maggior parte delle soluzioni di sicurezza di tipologia nota utilizza un approccio statico e reattivo, andando a cercare firme di attacchi noti per identificare e notificare la presenza di attacchi simili. Currently, various security solutions are known which have the ability to detect potential malicious activity by a user. Most known security solutions use a static and reactive approach, looking for known attack signatures to identify and notify the presence of similar attacks.

Tuttavia, queste soluzioni di tipologia nota, che sfruttano un approccio di rilevamento delle minacce basato sulle firme, non sono scevre da inconvenienti, tra i quali va annoverato il fatto che lo sviluppo delle firme per le nuove minacce richiede un'analisi approfondita su un sistema infetto, disperdendo così molto tempo e risorse, che risultano comunque insufficienti per affrontare minacce in rapida evoluzione. Inoltre, le firme non si adattano ai velocissimi cambiamenti che subiscono i vettori delle minacce. Infine, gli approcci basati sulle firme sono inefficaci per gli attacchi cosiddetti zero-day, che sfruttano vulnerabilità sconosciute e guindi non disponibili per rilevare minacce. However, these well-known solutions, which leverage a signature-based threat detection approach, are not without drawbacks, including the fact that developing signatures for new threats requires in-depth analysis of a system. infected, thus wasting a lot of time and resources, which are still insufficient to deal with rapidly evolving threats. Furthermore, signatures do not adapt to the lightning-fast changes that threat vectors undergo. Finally, signature-based approaches are ineffective for so-called zero-day attacks, which exploit unknown and therefore unavailable vulnerabilities to detect threats.

Un'evoluzione di questo approccio di rilevamento delle minacce basato sulle firme consiste nell'individuazione degli attacchi interni mediante la costruzione manuale di vari profili di comportamenti "normali" degli utenti, la rilevazione di deviazioni da questi profili e la stima del rischio di minaccia di queste anomalie . An evolution of this signature-based threat detection approach is the detection of internal attacks by manually constructing various profiles of "normal" user behavior, detecting deviations from these profiles, and estimating the threat risk of these anomalies.

Un approccio diffuso è quello di costruire una baseline del comportamento dell'utente (o più frequentemente di un account dell'utente, oppure di un indirizzo IP fra quelli utilizzati dall'utente, incorrendo così nelle limitazioni prima descritte), in modo da "imparare" le operazioni che l'utente utilizza normalmente. A common approach is to build a baseline of user behavior (or more frequently of a user account, or of an IP address among those used by the user, thus incurring the limitations described above), in order to "learn "operations that the user normally uses.

In genere si fissa una finestra temporale di una certa lunghezza prestabilita. Dopo la fase di apprendimento, i sistemi basati su algoritmi di machine learning considerano gli scostamenti da questa baseline come delle anomalie. Generally a time window of a certain predetermined length is fixed. After the learning phase, systems based on machine learning algorithms consider deviations from this baseline as anomalies.

Il difetto di questo approccio è che viene completamente trascurato l'ordine nel quale vengono eseguite le operazioni all'interno della finestra. Fin tanto che le operazioni sono già state eseguite nel passato, e finché si rispetta la frequenza entro una certa media e varianza, il comportamento dell'utente apparirà essere coerente con la sua baseline nella finestra analizzata. The drawback of this approach is that the order in which operations are performed within the window is completely neglected. As long as the operations have already been performed in the past, and as long as the frequency is respected within a certain mean and variance, the user's behavior will appear to be consistent with his baseline in the analyzed window.

Tuttavia, queste soluzioni di tipologia nota, che sfruttano un approccio di rilevamento delle minacce basato sui profili di comportamenti normali, non sono scevre da inconvenienti, tra i quali va annoverato il fatto che la costruzione e la configurazione di profili che caratterizzano con precisione il comportamento normale di un utente è molto difficile, essendo il comportamento umano estremamente mutevole e altamente variabile. However, these well-known solutions, which exploit a threat detection approach based on profiles of normal behavior, are not free from drawbacks, including the fact that the construction and configuration of profiles that precisely characterize the behavior normal behavior of a user is very difficult, human behavior being extremely changeable and highly variable.

Inoltre, l'utilizzo di tali profili per il rilevamento di anomalie comportamentali può produrre risultati errati e portare a molti falsi positivi che travolgono gli analisti della sicurezza informatica. Il bilanciamento tra un metodo di rilevamento eccessivamente permissivo, con il rischio di perdere una minaccia reale, e un metodo di rilevamento a maglie strette, che inondi di avvisi gli analisti della sicurezza, è un difficile compromesso. Additionally, using such profiles for behavioral anomaly detection can produce erroneous results and lead to many false positives that overwhelm cybersecurity analysts. Balancing an overly permissive detection method, at the risk of missing a real threat, and a tightly meshed detection method that floods security analysts with warnings, is a difficult trade-off.

A questo si deve aggiungere l'ulteriore inconveniente introdotto dalla consuetudine, attualmente molto diffusa, di utilizzare come dati di input una informazione parziale delle attività dell'utente. In particolare, solitamente non si è in grado di attribuire al medesimo utente le emissioni provenienti da diversi suoi account, con il risultato di una riduzione della capacità di correlazione rispetto alla totalità delle sue attività. Se infatti non si ha contezza del fatto che due attività siano opera del medesimo utente, è evidente come sia impossibile rilevare efficacemente tutti i possibili pattern di frode. To this we must add the further drawback introduced by the currently widespread custom of using partial information on the user's activities as input data. In particular, it is usually not possible to attribute to the same user the emissions from several of its accounts, with the result of a reduction in the correlation capacity with respect to the totality of its activities. In fact, if you are not aware of the fact that two activities are the work of the same user, it is evident that it is impossible to effectively detect all the possible patterns of fraud.

Compito precipuo del presente trovato è quello di superare i limiti dell'arte nota sopra esposti, escogitando un sistema e un metodo per la creazione e la verifica di baseline comportamentali basate su attività o sequenze di attività svolte dalle entità (comprendendo sia utenti sia apparati) che operano all'interno di una rete aziendale, al fine di individuare eventuali scostamenti comportamentali che potrebbero rappresentare attività dannose e minacce informatiche. The aim of the present invention is to overcome the limitations of the prior art described above, devising a system and a method for the creation and verification of behavioral baselines based on activities or sequences of activities carried out by the entities (including both users and devices) that operate within a corporate network, in order to identify any behavioral deviations that could represent harmful activities and cyber threats.

Nell'ambito di questo compito, uno scopo del presente trovato è quello di concepire un sistema e un metodo per la creazione e la verifica di baseline comportamentali che consentano un metodo di rilevamento delle minacce informatiche dinamico, adattativo e proattivo, al fine di contrastare le minacce esterne ed interne in continua evoluzione e ignote a priori. Within this aim, an object of the present invention is to devise a system and a method for creating and verifying behavioral baselines which allow a dynamic, adaptive and proactive cyber threat detection method, in order to counteract the external and internal threats in constant evolution and unknown a priori.

Un altro scopo del presente trovato è quello di escogitare un sistema e un metodo per la creazione e la verifica di baseline comportamentali che consentano un metodo di rilevamento delle anomalie comportamentali che non sia guidato da firme o da policy precostituite, ma che si adatti al comportamento delle entità e all'utilizzo dei sistemi informativi aziendali. Another object of the present invention is to devise a system and a method for creating and verifying behavioral baselines which allow a method for detecting behavioral anomalies which is not guided by pre-established signatures or policies, but which adapts to the behavior entities and the use of company information systems.

Un ulteriore scopo del presente trovato è quello di concepire un sistema e un metodo per la creazione e la verifica di baseline comportamentali che minimizzino i falsi positivi, aumentando così l'efficacia delle azioni dì risposta ad un attacco o una minaccia informatica. A further object of the present invention is to conceive a system and a method for creating and verifying behavioral baselines which minimize false positives, thus increasing the effectiveness of the response actions to an attack or a cyber threat.

Ancora, scopo del presente trovato è quello di escogitare un sistema e un metodo per la creazione e la verifica di baseline comportamentali che non si limitino ad effettuare correlazioni sulle attività degli accoun t, ma che effettuino correlazioni sulle attività del singolo utente fisico, indipendentemente dal numero di account assegnati al medesimo. Still another object of the present invention is to devise a system and a method for creating and verifying behavioral baselines which are not limited to carrying out correlations on the activities of the accounts, but which carry out correlations on the activities of the individual physical user, regardless of the number of accounts assigned to the same.

Non ultimo scopo del presente trovato è quello di realizzare un sistema e un metodo per la creazione e la verifica di baseline comportamentali che siano di elevata affidabilità, di relativamente semplice realizzazione, ed a costi competitivi se paragonati alla tecnica nota. Not least object of the present invention is to provide a system and a method for creating and verifying behavioral baselines which are highly reliable, relatively simple to produce, and at competitive costs if compared to the known art.

Questo compito, nonché questi ed altri scopi che meglio appariranno in seguito, sono raggiunti da un sistema per la creazione e la verifica di baseline comportamentali, comprendente un dispositivo di elaborazione centrale comprendente un'unità di controllo e mezzi di archiviazione di dati arricchiti, connesso e in comunicazione con una pluralità di apparati target e con un apparato di Identity & Access Management o IAM, caratterizzato dal fatto che detto dispositivo di elaborazione centrale comprende: This task, as well as these and other purposes which will appear better later, are achieved by a system for the creation and verification of behavioral baselines, comprising a central processing device comprising a control unit and means of storage of enriched data, connected and in communication with a plurality of target apparatuses and with an Identity & Access Management or IAM apparatus, characterized by the fact that said central processing device comprises:

- un modulo markoviano, configurato per costruire una matrice di transizioni di Markov atta a tracciare il passaggio da una prima attività ad una seconda attività temporalmente successiva, entrambe dette attività essendo definite da dati arricchiti e svolte da un'entità su detti apparati target; - a Markov module, configured to construct a matrix of Markov transitions suitable for tracing the transition from a first activity to a second activity temporally subsequent, both said activities being defined by enriched data and carried out by an entity on said target apparatuses;

un modulo di baseline, configurato per calcolare una pluralità di valori z-score individuali, uno per ogni singola coppia attività e entità, e una pluralità di valori z-score collettivi, uno per ogni singola coppia attività e finestra temporale; a baseline module, configured to calculate a plurality of individual z-score values, one for each single activity and entity pair, and a plurality of collective z-score values, one for each single activity and time window pair;

- un modulo di verifica anomalia su storico configurato per valutare la presenza di un'anomalia di comportamento di detta entità rispetto ad uno spazio individuale, ovvero rispetto alla storia delle attività passate di detta entità, sulla base di detta pluralità di valori z-score individuali; e - a historical anomaly verification module configured to evaluate the presence of an anomaly in the behavior of said entity with respect to an individual space, or with respect to the history of the past activities of said entity, on the basis of said plurality of individual z-score values ; And

un modulo di verifica anomalia su peer, configurato per valutare la presenza di un'anomalia di comportamento di detta entità rispetto ad uno spazio collettivo, ovvero rispetto alle attività attuali delle altre entità simili peer, sulla base di detta pluralità dì valori zscore collettivi. a peer anomaly verification module, configured to evaluate the presence of a behavior anomaly of said entity with respect to a collective space, or with respect to the current activities of other similar peer entities, on the basis of said plurality of collective zscore values.

Il compito e gli scopi prefissati sono altresì raggiunti da un metodo per la creazione e la verifica di baseline comportamentali, mediante un dispositivo di elaborazione centrale comprendente un'unità di controllo e mezzi di archiviazione dati arricchiti, connesso e in comunicazione con una pluralità di apparati target e con un apparato di Identity & Access Management o IAM, comprendente i passi che consistono nel: The intended aim and objects are also achieved by a method for the creation and verification of behavioral baselines, by means of a central processing device comprising a control unit and enriched data storage means, connected and in communication with a plurality of apparatuses. target and with an Identity & Access Management or IAM apparatus, including the steps that consist of:

- costruire una matrice di transizioni di Markov atta a tracciare il passaggio da una prima attività ad una seconda attività temporalmente successiva, tramite un modulo markoviano compreso in detto dispositivo di elaborazione centrale, entrambe dette attività essendo definite da detti dati arricchiti e svolte da un' entità su detti apparati target ; - constructing a matrix of Markov transitions capable of tracing the transition from a first activity to a second activity temporally subsequent, by means of a Markov module included in said central processing device, both said activities being defined by said enriched data and carried out by a entities on said target devices;

- calcolare una pluralità di valori z-score individuali, uno per ogni singola coppia attività e entità, e una pluralità di valori z-score collettivi, uno per ogni singola coppia attività e finestra temporale, tramite un modulo di baseline compreso in detto dispositivo di elaborazione centrale; - calculate a plurality of individual z-score values, one for each single activity and entity pair, and a plurality of collective z-score values, one for each single activity and time window pair, by means of a baseline module included in said central processing;

- valutare la presenza di un'anomalia di comportamento di detta entità rispetto ad uno spazio individuale, ovvero rispetto alla storia delle attività passate di detta entità, sulla base di detta pluralità di valori z-score individuali, tramite un modulo di verifica anomalia su storico compreso in detto dispositivo di elaborazione centrale; e - assess the presence of an anomaly in the behavior of said entity with respect to an individual space, or with respect to the history of the past activities of said entity, on the basis of said plurality of individual z-score values, by means of an anomaly on history verification module included in said central processing device; And

valutare la presenza di un'anomalia di comportamento di detta entità rispetto ad uno spazio collettivo, ovvero rispetto alle attività attuali delle altre entità simili peer, sulla base di detta pluralità di valori z-score collettivi, tramite un modulo di verifica anomalia su peer compreso in detto dispositivo di elaborazione centrale . assess the presence of an anomaly in the behavior of said entity with respect to a collective space, or with respect to the current activities of other similar peer entities, on the basis of said plurality of collective z-score values, through an anomaly verification module on the peer included in said central processing device.

Ulteriori caratteristiche e vantaggi del trovato risulteranno maggiormente dalla descrizione di una forma di realizzazione preferita, ma non esclusiva, del sistema e del metodo per la creazione e la verifica di baseline comportamentali secondo il trovato, illustrata a titolo indicativo e non limitativo con l'ausilio dei disegni allegati, in cui: Further characteristics and advantages of the invention will become clearer from the description of a preferred but not exclusive embodiment of the system and method for creating and verifying behavioral baselines according to the invention, illustrated by way of non-limiting example with the aid of the attached drawings, in which:

la figura 1 è uno schema a blocchi che illustra schematicamente una forma di realizzazione del sistema per la creazione e la verifica di baseline comportamentali, secondo il presente trovato; Figure 1 is a block diagram which schematically illustrates an embodiment of the system for creating and verifying behavioral baselines, according to the present invention;

la figura 2 è un diagramma di flusso che illustra schematicamente una forma di realizzazione del metodo per la creazione e la verifica di baseline comportamentali, secondo il presente trovato. Figure 2 is a flow chart which schematically illustrates an embodiment of the method for creating and verifying behavioral baselines, according to the present invention.

Con riferimento alla figura 1, il sistema per la creazione e la verifica di baseline comportamentali secondo il trovato, indicato globalmente con il numero di riferimento 10, comprende un dispositivo di elaborazione centrale 12, in breve elaboratore centrale 12, connesso e in comunicazione con una pluralità di apparati target 36 e con un apparato di Identity & Access Management 38, in breve apparato IAM 38, ad esempio tramite una rete di comunicazione telematica locale LAN. With reference to Figure 1, the system for creating and verifying behavioral baselines according to the invention, globally indicated with the reference number 10, comprises a central processing device 12, in short, central processor 12, connected and in communication with a plurality of target apparatuses 36 and with an Identity & Access Management apparatus 38, in short IAM apparatus 38, for example through a LAN local telematic communication network.

Gli apparati target o bersaglio 36 sono elaboratori (ad esempio di tipo server) , dispositivi elettronici, apparati di "rete, configurati tramite applicazioni software o simili, compresi in un sistema informativo più o meno complesso, in grado di rivelare, direttamente o indirettamente, la presenza di aggressori ed attacchi informatici, e che pertanto possono essere sorgenti di emissioni che è interessante raccogliere come dato di input. The target devices 36 are processors (for example of the server type), electronic devices, network devices, configured through software applications or the like, included in a more or less complex information system, capable of detecting, directly or indirectly, the presence of attackers and cyber attacks, and which therefore can be sources of emissions that it is interesting to collect as input data.

L'apparato IAM 38 (dall'inglese Identity and Access Management ) è un apparato configurato per gestire gli account degli utenti e le autorizzazioni di tali account all'interno di un sistema informativo, come quello in cui si inseriscono l'elaboratore centrale 12 e gli apparati target 36. The IAM 38 device (from the English Identity and Access Management) is an apparatus configured to manage user accounts and the authorizations of these accounts within an information system, such as the one in which the central computer 12 and the target devices 36.

In particolare, l'apparato IAM 38 gestisce centralmente gli account degli utenti (tipicamente rappresentati da uno username per ogni account) , le credenziali di sicurezza (tipicamente rappresentate da una password o chiave di accesso associata allo username dell 'account) e le autorizzazioni che controllano che gli utenti titolari di account possano accedere a tutte e solo le risorse di loro competenza. In particular, the IAM 38 device centrally manages the user accounts (typically represented by a username for each account), the security credentials (typically represented by a password or access key associated with the username of the account) and the authorizations that they check that account holders can access all and only the resources they are responsible for.

L'elaboratore centrale 12 comprende un'unità di controllo 14, un modulo di raccolta eventi grezzi 16, un modulo di raccolta stato IAM 18 (dall'inglese Identity and Access Management), un modulo di arricchimento dati 20, mezzi dì archiviazione dati arricchiti 22, un modulo markoviano 24, un modulo di baseline 26, un modulo di verifica anomalia su storico 28, e modulo di verifica anomalia su peer 30. The central processor 12 comprises a control unit 14, a raw event collection module 16, an IAM status collection module 18 (from English Identity and Access Management), a data enrichment module 20, enriched data storage means 22, a Markov module 24, a baseline module 26, an anomaly verification module on history 28, and an anomaly verification module on peer 30.

L'unità di controllo 14 è l'elemento funzionale principale dell'elaboratore centrale 12, e per questo motivo essa è collegata e in comunicazione con gli altri elementi compresi nell'elaboratore centrale 12. The control unit 14 is the main functional element of the central computer 12, and for this reason it is connected and in communication with the other elements included in the central computer 12.

L'unità di controllo 14 dell'elaboratore centrale 12 è dotata di opportune capacità di calcolo e di interfacciamento con gli altri elementi dell'elaboratore centrale 12, ed essa è configurata per comandare, controllare e coordinare il funzionamento degli elementi dell'elaboratore centrale 12 con i quali essa è collegata e in comunicazione. The control unit 14 of the central computer 12 is equipped with suitable computing and interfacing capabilities with the other elements of the central computer 12, and it is configured to command, control and coordinate the operation of the elements of the central computer 12. with which it is connected and in communication.

Il modulo di raccolta eventi grezzi 16 dell'elaboratore centrale 12 è configurato per collezionare i dati grezzi (ovvero "raw", non normalizzati) sugli eventi registrati dagli apparati target 36 connessi all'elaboratore centrale 12, considerandoli come emissioni di quegli stessi apparati target 36. Il modulo di raccolta eventi grezzi 16 è ulteriormente configurato per normalizzare questi dati sugli eventi, accorpando le informazioni simili sotto lo stesso nome semantico in modo da renderli raggruppabili . The raw event collection module 16 of the central processor 12 is configured to collect the raw data (ie "raw", not normalized) on the events recorded by the target devices 36 connected to the central processor 12, considering them as emissions from those same target devices 36. The raw event collection module 16 is further configured to normalize this event data by merging similar information under the same semantic name to make them groupable.

In pratica, ad ogni attività di uno degli apparati target 36 corrisponde un'emissione. Ad esempio, tra le emissioni degli apparati target 36 si potranno trovare le righe di log file di sistemi SAP, eventi DBMS Audit, righe del file di Apache httpd audit, eventi syslog, e così via. In practice, each activity of one of the target devices 36 corresponds to an emission. For example, among the emissions of the target devices 36 it will be possible to find the log file lines of SAP systems, DBMS Audit events, lines of the Apache httpd audit file, syslog events, and so on.

Il modulo di raccolta stato IAM 18 dell'elaboratore centrale 12 è configurato per collezionare i dati sullo stato attuale registrato dall'apparato IAM 38, che gestisce gli account degli utenti e le autorizzazioni di tali account. Così facendo, si ricava dall'apparato IAM 38 una mappatura aggiornata di tutti gli account dell'utente sugli apparati target 36. The IAM status collection module 18 of the central computer 12 is configured to collect data on the current status recorded by the IAM 38 apparatus, which manages the user accounts and the authorizations of these accounts. By doing so, an updated mapping of all user accounts on target devices 36 is obtained from the IAM 38.

Il modulo di arricchimento dati 20 dell'elaboratore centrale 12 è configurato per incrociare i dati grezzi sugli eventi provenienti dal modulo di raccolta eventi grezzi 16 e i dati sullo stato IAM provenienti dal modulo di raccolta stato IAM 18, identificando e raggruppando gli eventi grezzi associati ad uno specifico utente fisico, o più in generale di un'entità, indipendentemente dal numero di account di cui lo stesso utente è titolare. The data enrichment module 20 of the central processor 12 is configured to cross-reference the raw event data from the raw event collection module 16 and the IAM status data from the IAM status collection module 18, identifying and grouping the raw events associated with a specific physical user, or more generally of an entity, regardless of the number of accounts owned by the same user.

Gli eventi grezzi contengono sempre delle informazioni sull'utente che ha generato l'emissione. Dipendentemente dall'apparato target 36, queste informazioni sull'utente "autore" dell'emissione possono comprendere un indirizzo IP sorgente, un hostname, uno username, oppure una combinazione di queste informazioni. In particolare, lo username può essere sempre ricondotto ad un utente grazie alla mappatura sempre aggiornata prelevata dall'apparato IAM 38. Raw events always contain information about the user who generated the issue. Depending on the target apparatus 36, this information about the issuing user may include a source IP address, a hostname, a username, or a combination of this information. In particular, the username can always be traced back to a user thanks to the always updated mapping taken from the IAM 38 device.

Integrando tutte queste informazioni, il modulo di arricchimento dati 20 risale all'utente che opera per mezzo di un certo username, che sta utilizzando un certo indirizzo IP oppure che viene tracciato tramite un certo hostname. By integrating all this information, the data enrichment module 20 goes back to the user who operates by means of a certain username, who is using a certain IP address or who is traced through a certain hostname.

Arricchire i dati a disposizione prima dell'analisi massimizza la capacità di correlazione. Solitamente un utente, per accedere ai sistemi informatici, effettua un login, operazione attraverso la quale egli viene identificato dallo stesso sistema. Il login avviene mediante l'uso di credenziali personali, tipicamente uno username ed una password. Il login è dunque l'operazione che permette all'utente di accedere al proprio account sul sistema informatico. Questo account , contraddistinto da una username univoca, può essere stato configurato dall'amministratore del sistema per accedere a tutte le funzionalità offerte dal sistema, oppure può essere a sua volta un account di amministrazione, o ancora può possedere solamente un sottoinsieme delle funzionalità offerte dal sistema . Enriching the available data before analysis maximizes the correlation capacity. Usually a user, to access the computer systems, makes a login, an operation through which he is identified by the same system. Login takes place through the use of personal credentials, typically a username and password. The login is therefore the operation that allows the user to access his account on the computer system. This account, distinguished by a unique username, may have been configured by the system administrator to access all the functions offered by the system, or it may in turn be an administration account, or may have only a subset of the functions offered by the system. system.

Tuttavia, è frequente che un utente possieda diversi account distribuiti sui vari sistemi informatici aziendali. E' anche possibile che un utente possieda più di un account sullo stesso sistema informatico, ed abbia quindi la possibilità di utilizzare lo stesso sistema con credenziali diverse (ossia diversi username) e quindi diversi livelli di autorizzazione. However, it is common for a user to have several accounts spread across the various corporate IT systems. It is also possible that a user has more than one account on the same computer system, and therefore has the possibility of using the same system with different credentials (i.e. different usernames) and therefore different levels of authorization.

I sistemi informatici, in particolare quelli aziendali, spesso mantengono un log file o comunque un registro nel quale annotano le operazioni effettuate dagli utenti sul sistema. Nella migliore delle ipotesi, l'utente viene tracciato per mezzo dello username con il quale si è autenticato. Come accennato sopra, invece, alcuni sistemi si limitano a riportare semplicemente l'hostname oppure l'indirizzo IP del dispositivo utilizzato dall'utente. Computer systems, in particular corporate ones, often keep a log file or in any case a register in which they record the operations carried out by users on the system. At best, the user is tracked by means of the username with which he authenticated. As mentioned above, however, some systems simply report the hostname or IP address of the device used by the user.

La famiglia di sistemi informatici che traccia le azioni dell'utente fisico per mezzo del suo username è la migliore delle ipotesi, poiché volendo risalire all'utente fisico, a partire proprio dal log file oppure dal registro delle attività del sistema, la strada da percorrere è relativamente semplice. Si deve infatti risalire al titolare dell'account del cui username si è avuto riscontro nel log file. The family of computer systems that track the actions of the physical user by means of his username is the best case scenario, since if you want to trace the physical user, starting from the log file or from the system activity log, the way to go it is relatively simple. In fact, it must be traced back to the account holder whose username was confirmed in the log file.

Questi sistemi, presi singolarmente, tengono dunque traccia delle azioni effettuate dagli account, e non possono risalire all'utente che sta operando con quell 'account. Come accennato sopra, questa informazione è presente (vedere apparato IAM 38), ma non è competenza del singolo sistema. These systems, taken individually, therefore keep track of the actions performed by the accounts, and cannot trace the user who is operating with that account. As mentioned above, this information is present (see IAM 38 apparatus), but it is not the responsibility of the single system.

Ad esempio, nel caso dell'utente avente due account sullo stesso sistema, le operazioni degli account apparirebbero all'interno del log file come trascrizione di eventi associati a due diversi username . Il fatto che dietro entrambi gli username, e dunque dietro entrambi gli account, vi sia in realtà lo stesso utente fisico ad operare non è direttamente deducibile. For example, in the case of the user having two accounts on the same system, the account operations would appear in the log file as a transcript of events associated with two different usernames. The fact that behind both usernames, and therefore behind both accounts, there is actually the same physical user operating is not directly deductible.

Come accennato sopra, per risalire all'informazione che entrambi gli account appartengono allo stesso utente fisico è necessaria una ulteriore informazione sulla assegnazione o titolarità dell 'account . Questa informazione è contenuta nell'apparato IAM 38, e consiste nell'assegnare i vari username ai rispettivi utenti fisici. As mentioned above, further information on the assignment or ownership of the account is required to trace the information that both accounts belong to the same physical user. This information is contained in the IAM 38 device, and consists in assigning the various usernames to the respective physical users.

Pertanto, il modulo di arricchimento dati 20 è configurato per incrociare le informazioni account/ utente, preferibilmente in tempo reale, al fine di "arricchire" i dati relativi alle attività degli account con le informazioni "di contesto" legate all'utente, per poi usarle nella fase di correlazione sui dati. Therefore, the data enrichment module 20 is configured to cross account / user information, preferably in real time, in order to "enrich" the data relating to the activities of the accounts with the "context" information related to the user, and then use them in the data correlation phase.

Questa operazione di arricchimento dei dati, da svolgere prima dell'analisi, aumenta drasticamente la qualità del dato su cui si lavora per identificare anomalie, incidenti di sicurezza ed infine pattern di frode. L'arricchimento permette infatti di incrociare correttamente tutti gli eventi riconducibili allo stesso utente fisico, indipendentemente da quanti diversi account fra quelli per lui disponibili abbia utilizzato. Questo ricostruisce un contesto utente completo e massimizza la capacità di correlare gli eventi prodotti dall'utente all'interno dell'infrastruttura aziendale. This data enrichment operation, to be carried out before the analysis, drastically increases the quality of the data on which we work to identify anomalies, security incidents and finally patterns of fraud. The enrichment allows in fact to correctly cross all the events attributable to the same physical user, regardless of how many different accounts among those available to him he has used. This reconstructs a complete user context and maximizes the ability to correlate events produced by the user within the corporate infrastructure.

Nel seguito, quindi, quando si parla di operazioni effettuate da un utente, oppure di attività effettuate da un utente, quello che si intende è appunto di aver precedentemente ricondotto tutti gli account di un utente alla sua reale ed univoca identità. In the following, therefore, when we talk about operations carried out by a user, or activities carried out by a user, what is meant is precisely to have previously brought all the accounts of a user back to his real and unique identity.

Come accennato sopra, esistono alcuni sistemi informatici che operano in maniera diversa, non basata sull'account . Questa famiglia di sistemi informatici, nella quale troviamo ad esempio apparati e sonde che si occupano di sicurezza a livello di networking, riportano nel log file oppure nel registro delle attività l'hostname oppure l'indirizzo IP del dispositivo tramite il quale l'utente è connesso alla rete aziendale. As mentioned above, there are some computer systems that operate differently, not based on the account. This family of computer systems, in which we find, for example, devices and probes that deal with security at the networking level, report in the log file or in the activity log the hostname or IP address of the device through which the user is connected to the company network.

In un contesto aziendale però è possibile riuscire a risalire all'assegnatario dei dispositivi collegati alla rete aziendale grazie all'esistenza di una policy di asset management ed alcuni enforcement che limitano le possibilità di connessione per mezzo di dispositivi non conosciuti. Si può quindi, ancora una volta, pensare di sostituire l'utente fisico all'indirizzo IP da egli utilizzato, così come all'hostname del suo dispositivo. D'altra parte però un utente fisico potrebbe utilizzare diversi hostname o diversi indirizzi IP, anche contemporaneamente. Ancora una volta, riuscire ad effettuare la risoluzione in tempo reale, assegnando così gli eventi relativi ai dispositivi ai rispettivi utenti prima dell'analisi migliora enormemente la capacità di effettuare correlazioni e di riscontrare comportamenti anomali in maniera efficace . In a corporate context, however, it is possible to trace the assignee of the devices connected to the corporate network thanks to the existence of an asset management policy and some enforcement that limit the possibility of connection by means of unknown devices. We can therefore, once again, think of replacing the physical user with the IP address he used, as well as the hostname of his device. On the other hand, however, a physical user could use different hostnames or different IP addresses, even at the same time. Once again, being able to perform the resolution in real time, thus assigning the events related to the devices to the respective users before the analysis greatly improves the ability to perform correlations and detect anomalous behaviors in an effective way.

Consideriamo l'esempio di un utente che utilizza contemporaneamente tre indirizzi IP. Gli eventi generati da questi tre indirizzi IP vanno considerati come operazioni sequenziali dello stesso utente. Se al contrario ci si limita alla semplice analisi degli eventi dei singoli indirizzi IP, e si ricorre all'identificazione dell'utente solo in caso di comportamento anomalo di un certo indirizzo IP, è evidente che si tralascia una buona parte delle informazioni. Infatti, un eventuale comportamento anomalo riscontrabile solo sommando gli eventi dei tre indirizzi IP facenti capo allo stesso utente, passerebbe inosservato se gli indirizzi IP fossero considerati singolarmente. Let's consider the example of a user using three IP addresses at the same time. Events generated by these three IP addresses should be considered as sequential operations by the same user. If, on the other hand, we limit ourselves to the simple analysis of the events of individual IP addresses, and we resort to user identification only in case of anomalous behavior of a certain IP address, it is evident that a good part of the information is left out. In fact, any anomalous behavior that can only be found by adding the events of the three IP addresses belonging to the same user would go unnoticed if the IP addresses were considered individually.

Anche in questo caso, dunque, si vuole porre l'attenzione sulla maggiore potenzialità offerta dall' effettuare questo incrocio di informazioni tra indirizzo IP/utente oppure hostname/ utente a monte dell'analisi, e non a valle della stessa. Also in this case, therefore, we want to focus on the greater potential offered by carrying out this intersection of information between IP address / user or hostname / user upstream of the analysis, and not downstream of the same.

Gli utenti, nell'utilizzare i sistemi aziendali, lasciano dunque tracce nei log file e nei registri. Come abbiamo visto, però, nei log file non compare l'utente con il suo nome e cognome, né può essere presente un suo identificativo univoco globale. Nei log file troviamo soltanto riferimenti ad un account, ovvero ad uno username che ha utilizzato l'utente su quel sistema. Oppure, in caso di sonde ed apparati di rete, troveremo dei riferimenti circa l'indirizzo IP oppure l'hostname dal quale l'utente si è connesso. Users, in using company systems, therefore leave traces in log files and logs. As we have seen, however, the user with his name and surname does not appear in the log files, nor can there be a globally unique identifier. In the log files we find only references to an account, or to a username that the user used on that system. Or, in the case of probes and network devices, we will find references about the IP address or the hostname from which the user connected.

Tutte le aziende mantengono questi registri di eventi, poiché esistono regolamentazioni precise che impongono di collezionare e mantenere per un certo numero di anni questi registri, in modo da agevolare operazioni di audit e analisi forense (HIPAA, Sarbanes-Oxley, PCI-DSS, FISMA) . In caso contrario, l'azienda incorrerebbe in violazioni di conformità rispetto alle suddette normative. All companies maintain these logs of events, as there are precise regulations that require these logs to be collected and maintained for a number of years, in order to facilitate audits and forensic analysis (HIPAA, Sarbanes-Oxley, PCI-DSS, FISMA ). Otherwise, the company would incur compliance violations with respect to the aforementioned regulations.

Questi dati sono assolutamente eterogenei, poiché non esiste uno standard comune ai molteplici sistemi aziendali. Le attività, ovvero gli eventi determinati dalle azioni dell'utente sul sistema, vengono tracciati in maniera personalizzata e differente da sistema a sistema. Vi sono però alcuni punti in comune, ad esempio nella quasi totalità dei casi troveremo, all'interno della singola entry relativa al singolo evento registrato: These data are absolutely heterogeneous, since there is no common standard for multiple business systems. The activities, that is the events determined by the user's actions on the system, are tracked in a personalized and different way from system to system. However, there are some points in common, for example in almost all cases we will find, within the single entry relating to the single recorded event:

- una data relativa all'evento/attività; - a date relating to the event / activity;

- un'entità che ha generato l'evento, oppure alla quale l'attività si riferisce (identificata in maniera differente a seconda del sistema, potrebbe essere uno username, un hostname o un indirizzo IP) ; - an entity that generated the event, or to which the activity refers (identified differently depending on the system, it could be a username, a hostname or an IP address);

un'azione, una descrizione dell'evento generato, dell'attività svolta, una operazione in un set di possibili operazioni effettuabili, oppure un destinatario che subisce l'evento. an action, a description of the event generated, the activity carried out, an operation in a set of possible operations that can be carried out, or a recipient who suffers the event.

In ogni caso, essendo presente una data, o più in generale essendo i log file ed i registri ordinati secondo un criterio temporale, è sempre possibile effettuare un'analisi degli eventi seguendo l'ordine in cui sono stati generati dall'attività dell'utente. In any case, since there is a date, or more generally since the log files and the logs ordered according to a temporal criterion, it is always possible to carry out an analysis of the events following the order in which they were generated by the user's activity. .

I mezzi di archiviazione dati arricchiti 22 dell'elaboratore centrale 12, come ad esempio un database memorizzato su supporti di memoria opportunamente dimensionati, sono configurati per memorizzare i dati relativi a ciascun utente e prodotti dal modulo di arricchimento dati 20 secondo la procedura descritta sopra. The enriched data storage means 22 of the central computer 12, such as for example a database stored on suitably sized memory supports, are configured to store the data relating to each user and produced by the data enrichment module 20 according to the procedure described above.

A titolo esemplificativo, i mezzi di archiviazione dati arricchiti 22 possono memorizzare i dati relativi all'utente John Doe, agli account associati allo stesso utente identificati degli username j.doe@acme.org e johnny@acme.org, e alle attività svolte dallo stesso utente nei vari sistemi informatici. By way of example, the enriched data storage means 22 can store data relating to the user John Doe, to the accounts associated with the same user identified by the usernames j.doe@acme.org and johnny@acme.org, and to the activities carried out by same user in the various computer systems.

Il modulo markoviano 24 dell'elaboratore centrale 12 è configurato per costruire una catena di Markov, o meglio una matrice di transizioni di Markov, atta a tracciare il passaggio da una emissione alla successiva, vale a dire da un'attività a quella temporalmente successiva, entrambe le attività essendo definite dai dati arricchiti e svolte da uno specifico utente fisico sugli apparati target 36. Chiaramente, a seconda di quante emissioni sono considerate contemporaneamente, si avrà una diversa dimensione della matrice. The Markov module 24 of the central processor 12 is configured to construct a Markov chain, or rather a matrix of Markov transitions, capable of tracing the passage from one emission to the next, i.e. from one activity to the temporally subsequent one, both activities being defined by the enriched data and carried out by a specific physical user on the target devices 36. Clearly, depending on how many emissions are considered simultaneously, there will be a different dimension of the matrix.

Obiettivo della costruzione della matrice di transizione di Markov è quello di creare un contesto operativo all'interno del quale sia possibile individuare comportamenti anomali rispetto ai comportamenti precedentemente osservati, ossia individuare nuove attività o sequenze di attività che si discostano significativamente da una normalità definita da attività o sequenze di attività osservate durante un periodo iniziale detto di apprendimento o learning. The objective of the construction of the Markov transition matrix is to create an operational context within which it is possible to identify anomalous behaviors with respect to previously observed behaviors, that is to identify new activities or sequences of activities that deviate significantly from a normality defined by activities or sequences of activities observed during an initial period called learning or learning.

I dati di cui fa uso il modulo markoviano 24 per la costruzione della matrice di transizioni di Markov sono definiti emissioni, corrispondenti ai dati arricchiti memorizzati nei mezzi di archiviazione dati arricchiti 22 dell'elaboratore centrale 12. The data used by the Markov module 24 for the construction of the Markov transition matrix are defined emissions, corresponding to the enriched data stored in the enriched data storage means 22 of the central processor 12.

Un'infrastruttura informatica aziendale (centralizzata in un singolo datacenter, distribuita geograficamente in diverse sedi, cloud-based oppure ibrida) comprende entità elettroniche, che possono essere fisiche oppure virtuali, interconnesse fra loro per mezzo di uno 0 più protocolli di comunicazione . Gli utenti autorizzati interagiscono con queste entità per mezzo di alcune entità specifiche, ossia i dispositivi end-user . Esempi tipici di entità sono 1 server (locali, remoti, fìsici o virtualizzati) , i database, gli apparati di rete e di sicurezza, le workstation , gli smartphone, le stampanti, le camere IP e qualsiasi altro dispositivo che si possa interiacciare con l'infrastruttura. A corporate IT infrastructure (centralized in a single datacenter, geographically distributed in different locations, cloud-based or hybrid) includes electronic entities, which can be physical or virtual, interconnected by means of one or more communication protocols. Authorized users interact with these entities by means of some specific entities, i.e. end-user devices. Typical examples of entities are servers (local, remote, physical or virtualized), databases, network and security devices, workstations, smartphones, printers, IP cameras, and any other device that can be interacted with. 'infrastructure.

Un'emissione è il singolo record di informazione riguardante un'entità, e può essere prodotta dall'entità stessa oppure da un'altra delle entità connesse all'infrastruttura . Una emissione è caratterizzata temporalmente da una data precisa (tipicamente con precisione nell'ordine dei secondi o dei millisecondi) . Un'emissione contiene al suo interno almeno un riferimento ad un'entità in qualità di "sorgente", ed almeno un "evento" o una "azione", ovvero una descrizione più o meno rigorosa dell'accaduto, della misurazione o dell'osservazione al tempo indicato. A seconda del contesto, l'entità a cui si fa riferimento come "sorgente" può essere il soggetto che compie l'azione descritta, oppure l'oggetto dell'osservazione riportata. An emission is the single record of information concerning an entity, and can be produced by the entity itself or by another of the entities connected to the infrastructure. An emission is temporally characterized by a precise date (typically with precision in the order of seconds or milliseconds). An emission contains within it at least one reference to an entity as a "source", and at least one "event" or "action", or a more or less rigorous description of the event, measurement or observation at the time indicated. Depending on the context, the entity referred to as the "source" may be the subject performing the described action, or the object of the reported observation.

Esempi tipici dì repository, presenti nelle infrastrutture aziendali, contenenti le emissioni delle entità che compongono 1'infrastruttura sono: log file, registri di sistema, tabelle di audit, syslog collector , WMI collector e Netflow collector . Typical examples of repositories, present in the company infrastructures, containing the emissions of the entities that make up the infrastructure are: log files, system logs, audit tables, syslog collector, WMI collector and Netflow collector.

In particolare, il funzionamento del modulo markoviano 24 è il seguente. Supponiamo di avere un certo numero U di utenti i quali eseguono alcune attività in una sequenza temporale precisa. Supponiamo di avere degli utenti indicizzati da u = 1 ... U, delle attività a = 1 ... A, e delle finestre temporali f = 1 ... F. Queste attività possono essere una qualunque sequenza di eventi, rilevamenti o misurazioni riguardanti l'utente. In generale sono legate alle emissioni dell'utente, e sono considerate in ordine temporale. In particular, the operation of the Markov module 24 is as follows. Suppose we have a certain number of U users who perform some tasks in a precise time sequence. Suppose we have users indexed by u = 1 ... U, activities a = 1 ... A, and time windows f = 1 ... F. These activities can be any sequence of events, surveys or measurements concerning the user. In general they are related to the user's emissions, and are considered in chronological order.

La matrice di transizioni di Markov descrive una sequenza, ordinata cronologicamente, di attività fatte dagli utenti all'interno di ogni finestra temporale. Si noti che il termine attività utilizzato in maniera generica: una attività potrebbe voler dire aver osservato una certa emissione, averne osservate due di seguito, oppure aver osservato una particolare sequenza di emissioni complessa a piacere, alla quale si vuole associare una specifica attività. The Markov transition matrix describes a chronologically ordered sequence of user activities within each time window. Note that the term activity used in a generic way: an activity could mean having observed a certain emission, having observed two of them in a row, or having observed a particular sequence of complex emissions at will, to which you want to associate a specific activity.

La matrice di transizioni di Markov è organizzata in un tensore a tre indici A il cui elemento Avaf definisce quante volte l'utente u compie l'attività a nella finestra f. Quindi, con le attività così definite, ci si concentra sulle frequenze delle attività. The Markov transition matrix is organized in a three-index tensor A whose element Avaf defines how many times the user u performs activity a in the window f. Therefore, with the activities thus defined, we focus on the frequencies of the activities.

Le finestre temporali f sono lunghe Δt (ad esempio un giorno) . Il periodo di apprendimento durante il quale si definiscono i comportamenti normali è lungo T = FΔt (ad esempio un anno). The time windows f are long Δt (for example one day). The learning period during which normal behaviors are defined is long T = FΔt (for example one year).

L'attività che interessa monitorare è ogni passaggio da una emissione alla successiva. Ovvero, data una sequenza di eventi osservati guardando le emissioni di una certa entità (ad esempio un utente) all'interno di una giornata, la matrice di transizioni di Markov è organizzata in modo tale che il tensore A contenga le frequenze con le quali gli utenti u, ogni finestra f, passano da una emissione alla successiva, effettuando quindi l'attività a. Se T è il numero totale di emissioni osservate, avremo quindi che, al massimo, A potrà essere uguale a T<2>, nel caso tutte le coppie ordinate di possibili emissioni vengano eseguite durante il periodo di apprendimento . The activity to be monitored is each passage from one issue to the next. That is, given a sequence of events observed by looking at the emissions of a certain entity (for example a user) within a day, the Markov transition matrix is organized in such a way that the tensor A contains the frequencies with which the users u, each window f, pass from one issue to the next, thus carrying out activity a. If T is the total number of emissions observed, then we will have that, at most, A can be equal to T <2>, if all the ordered pairs of possible emissions are performed during the learning period.

Durante il periodo di apprendimento, il modulo markoviano 24 costruisce il tensore a tre indici Λ, il cui elemento Auaf definisce quante volte l'utente u compie l'attività a (ovvero passa da una certa emissione e ad un'altra e') nella finestra indicizzata da f. Queste frequenze delle attività andranno a costituire il benchmark rispetto al quale successivamente si verificherà l'eventuale anomalia. During the learning period, the Markov module 24 constructs the three-index tensor Λ, whose element Auaf defines how many times the user u performs the activity a (i.e. passes from a certain emission and to another e ') in the window indexed by f. These frequencies of activities will constitute the benchmark against which any anomaly will subsequently occur.

Il modulo di baseline 26 dell'elaboratore centrale 12 è configurato per analizzare i passaggi tra le attività svolte da uno specifico utente fisico, o più in generale di un'entità, mediante tecniche note di apprendimento automatico (in inglese machine learning) , a partire dalla matrice di transizioni di Markov per ogni utente u, per ogni finestra temporale f (ad esempio un giorno) . In particolare, il modulo di baseline 26 è configurato per calcolare una pluralità di valori z-score individuali, uno per ogni singola coppia attività e entità (come ad esempio un utente), e una pluralità di valori z-score collettivi, uno per ogni singola coppia attività e finestra temporale. In pratica, i valori z-score rappresentano la probabilità che un'attività o una sequenza di attività di uno specifico utente fisico, o più in generale di un'entità, costituisca un'anomalia di comportamento. Pertanto, ciò fornisce un indice quantitativo sulla serie di emissioni osservate. The baseline module 26 of the central computer 12 is configured to analyze the steps between the activities carried out by a specific physical user, or more generally of an entity, by means of known machine learning techniques, starting from from the Markov transition matrix for each user u, for each time window f (for example one day). In particular, the baseline module 26 is configured to calculate a plurality of individual z-score values, one for each single activity and entity pair (such as a user), and a plurality of collective z-score values, one for each single activity pair and time window. In practice, the z-score values represent the probability that an activity or a sequence of activities of a specific physical user, or more generally of an entity, constitutes an anomaly of behavior. Therefore, this provides a quantitative index on the observed series of emissions.

Tale approccio di apprendimento evita di gettare via, come accade nelle analisi tradizionali, moltissime informazioni utili ad identificare l'utilizzatore, in particolare relative alla presenza di pattern cronologici ripetuti. In parole povere, invece di limitarsi ad "imparare" che l'utente solitamente effettua le operazioni "foo", "bar" e "foobar" nel corso di una certa finestra temporale, ad esempio il modulo di baseline 26 impara che l'utente effettua sempre una volta "bar" dopo aver eseguito "foo", e poi effettua una volta "foobar" dopo aver effettuato "bar", ovvero riproduce il pattern "foo", "bar", "foobar" in questo preciso ordine. This learning approach avoids throwing away, as happens in traditional analyzes, a lot of information useful for identifying the user, in particular relating to the presence of repeated chronological patterns. In other words, instead of just "learning" that the user usually carries out the operations "foo", "bar" and "foobar" during a certain time window, for example the baseline module 26 learns that the user always does "bar" once after executing "foo", and then does "foobar" once after executing "bar", ie it plays the pattern "foo", "bar", "foobar" in this precise order.

Così facendo un aggressore che eseguisse, sempre nello stesso intervallo temporale, "bar", "foo", "foobar" apparirebbe in linea con la baseline secondo gli approcci tradizionali, mentre il presente approccio sarebbe in grado di identificarlo come anomalo. In this way an attacker who executed, always in the same time interval, "bar", "foo", "foobar" would appear in line with the baseline according to traditional approaches, while the present approach would be able to identify it as anomalous.

In conclusione, registrare correttamente l'ordine con il quale l'utente effettua le operazioni, consente di costruire un modello probabilistico in grado di quantificare la probabilità che un certo pattern possa appartenere o meno all'utente. Ciò a prescindere dalla probabile presenza di singoli elementi noti. Infatti, le medesime operazioni (per esempio, le uniche autorizzate) potrebbero essere eseguite utilizzando pattern inesplorati o molto poco battuti, evidenziando una anomalia lì dove non cambiano le operazioni, ma solamente l'ordine di esecuzione . In conclusion, correctly recording the order in which the user carries out the operations, allows to build a probabilistic model capable of quantifying the probability that a certain pattern may or may not belong to the user. This is regardless of the probable presence of single known elements. In fact, the same operations (for example, the only authorized ones) could be performed using unexplored or very little used patterns, highlighting an anomaly where the operations do not change, but only the order of execution.

In particolare, il funzionamento del modulo di baseline 26 è il seguente. Preliminarmente, si noti che i comportamenti anomali possono risultare tali rispetto ad una normalità definibile in due modalità: rispetto al passato del medesimo utente oppure rispetto al presente degli altri utenti. Pertanto, è possibile utilizzare due benchmark distinti: uno individuale, confrontando l'utente con sé stesso nelle finestre passate, e uno collettivo, confrontando a tempo fissato l'utente con gli altri utenti. In particular, the operation of the baseline module 26 is as follows. Preliminarily, it should be noted that anomalous behaviors can be such with respect to a normality that can be defined in two ways: with respect to the past of the same user or with respect to the present of other users. Therefore, it is possible to use two distinct benchmarks: an individual one, comparing the user with himself in the past windows, and a collective one, comparing the user with the other users at a fixed time.

Infatti, sebbene l'utilizzo di algoritmi di clustering ed anomaly detection possano aiutare a scremare i falsi allarmi, limitandosi a prendere in considerazione la sola "storia personale" dell'utente per delinearne un comportamento anomalo, si incorrerebbe sempre nel rischio di un "falso positivo". Per guesto motivo si prevede l'estensione del campo di analisi anche in orizzontale, rispetto agli altri utenti, oltre che in verticale, sulla storia passata dell'utente analizzato. Utilizzando i dati degli utenti "simili" a guello preso in analisi, infatti, il modulo di baseline 26 è in grado di calcolare un valore z score. Sulla base di questo valore zscore un certo comportamento dell'utente potrà essere considerato o meno "normale" rispetto al comportamento dei colleghi, nella stessa finestra temporale . In fact, although the use of clustering and anomaly detection algorithms can help to skim the false alarms, limiting itself to taking into consideration only the user's "personal history" to outline an anomalous behavior, one would always run the risk of a "false positive". For this reason, the extension of the analysis field is also foreseen horizontally, with respect to other users, as well as vertically, on the past history of the analyzed user. In fact, using the data of users "similar" to the one taken in analysis, the baseline module 26 is able to calculate a z score value. On the basis of this zscore value, a certain user behavior may or may not be considered "normal" compared to the behavior of colleagues, in the same time window.

Iniziamo col confronto fra l'utente e il suo passato, e fissiamo quindi u = u* . Per ogni attività a, ovvero per ogni coppia ordinata di emissioni (e, e'), possiamo calcolare la media e la deviazione standard delle frequenze delle attività osservate, secondo le seguenti le formule, rispettivamente: We start by comparing the user with his past, and then fix u = u *. For each activity a, i.e. for each ordered pair of emissions (e, e '), we can calculate the average and standard deviation of the frequencies of the observed activities, according to the following formulas, respectively:

Queste due quantità esprimono rispettivamente la normalità, ovvero in media quanto l'utente u* ha eseguito l'attività a, e l'unità di misura della distanza da tale normalità che sia tollerabile al fine di considerare una deviazione da tale comportamento come una ragionevole fluttuazione statistica. These two quantities respectively express normality, i.e. on average how much the user u * performed activity a, and the unit of measurement of the distance from this normality that is tolerable in order to consider a deviation from this behavior as a reasonable statistical fluctuation.

In maniera del tutto analoga è possibile effettuare il confronto tra l'utente e gli altri utenti, valutando così eventuali anomalie non rispetto al passato dell'utente u*, ma rispetto al comportamento degli altri utenti, fissando quindi una certa finestra f*. In a completely analogous way, it is possible to compare the user with the other users, thus evaluating any anomalies not with respect to the past of user u *, but with respect to the behavior of other users, thus fixing a certain window f *.

Andando a mediare non più a utente fissato e variando le finestre, ma a finestra fissata e variando l'utente, è possibile ottenere un'analisi non più dello spazio individuale ma dello spazio collettivo, in cui si confrontano non lo stesso utente ognuno col proprio passato ma gli utenti fra loro. La media e la deviazione standard delle frequenze delle attività osservate sono calcolate secondo seguenti le formule, rispettivamente: By going to mediate no longer with a fixed user and varying the windows, but with a fixed window and varying the user, it is possible to obtain an analysis no longer of the individual space but of the collective space, in which not the same user is confronted each with his own past but users between them. The mean and standard deviation of the observed activity frequencies are calculated according to the following formulas, respectively:

Supponiamo ora di osservare nel periodo di test una certa frequenza ψ per l'attività α dell'utente u* nella finestra temporale f* . Assumendo una distribuzione gaussiana delle frequenze possiamo calcolare quanto è improbabile osservare φ rispetto alle misure di normalità precedentemente introdotte. In particolare, lo zscore può essere calcolato rispetto allo spazio individuale e rispetto allo spazio collettivo, secondo le seguenti le formule, rispettivamente: Now suppose we observe in the test period a certain frequency ψ for the activity α of user u * in the time window f *. Assuming a Gaussian frequency distribution we can calculate how unlikely it is to observe φ with respect to the normality measures previously introduced. In particular, the zscore can be calculated with respect to the individual space and with respect to the collective space, according to the following formulas, respectively:

Sotto le ipotesi del Teorema del Limite Centrale le variabili z tendono a una gaussiana al crescere di F o di U, rispettivamente. Per F o U sufficientemente grandi si può quindi associare ad ogni z una probabilità per ogni φ. Tale probabilità è una misura dell'anomalia della singola frequenza osservata durante la fase di test . Under the assumptions of the Central Limit Theorem the variables z tend to a Gaussian as F or U increases, respectively. For sufficiently large F or U we can therefore associate to every z a probability for every φ. This probability is a measure of the single frequency anomaly observed during the test phase.

Si noti che quanto sopra è generalizzabile a sequenze di emissioni più lunghe di due, a prezzo di aumentare la dimensione del tensore lungo la dimensione delle attività, ovvero aumentare A. Note that the above is generalizable to emission sequences longer than two, at the cost of increasing the size of the tensor along the size of the assets, i.e. increasing A.

Il modulo di verifica anomalia su storico 28 dell'elaboratore centrale 12 è configurato per valutare la presenza di un'eventuale anomalia di comportamento di un utente u, o più in generale di un'entità, rispetto allo spazio individuale, ovvero rispetto alla storia delle attività passate dell'utente u, sulla base del valore z-score individuale precedentemente calcolato dal modulo di baseline 26. The module for checking the anomaly on history 28 of the central computer 12 is configured to evaluate the presence of a possible anomaly in the behavior of a user u, or more generally of an entity, with respect to the individual space, or rather with respect to the history of past activities of user u, based on the individual z-score previously calculated by the baseline module 26.

Il modulo di verifica anomalia su peer 30 dell'elaboratore centrale 12 è configurato per valutare la presenza di un'eventuale anomalia di comportamento di un utente u, o più in generale di un'entità, rispetto allo spazio collettivo, ovvero rispetto alle attività attuali degli altri utenti simili detti peer, sulla base del valore z-score collettivo precedentemente calcolato dal modulo di baseline 26. The module for checking the anomaly on peer 30 of the central computer 12 is configured to evaluate the presence of a possible anomaly in the behavior of a user u, or more generally of an entity, with respect to the collective space, that is, with respect to current activities of the other similar users called peers, on the basis of the collective z-score previously calculated by the baseline module 26.

In una forma di realizzazione, nell'ambito della valutazione della presenza di un'anomalia, il modulo di verifica anomalia su storico 28 e il modulo di verifica anomalia su peer 30 utilizzano tecniche note di clustering e anomaly detection, come ad esempio l'algoritmo di Rodriguez-Laio o ABOD (Angle-Based Oulier Detection ), che in questo ambito possono essere utilizzati al fine di evidenziare possibili anomalie in questi z-score. In one embodiment, in the context of evaluating the presence of an anomaly, the anomaly on history verification module 28 and the peer anomaly verification module 30 use known clustering and anomaly detection techniques, such as the algorithm Rodriguez-Laio or ABOD (Angle-Based Oulier Detection), which in this context can be used in order to highlight possible anomalies in these z-scores.

A titolo esemplificativo, supponiamo che il modulo di baseline 26 abbia calcolato, per la finestra temporale dì riferimento F*, e dopo aver appreso il comportamento di un utente U* nell'arco di un periodo precedente sufficientemente lungo, un valore z-score sufficientemente elevato in modulo da rappresentare un'anomalia. Assumiamo dunque di aver ricevuto la segnalazione di un comportamento che si discosta in maniera significativa da quello ritenuto normale dal modulo di verifica anomalia su storico 28. A questo punto possiamo evidenziare e segnalare che l'utente U* ha compiuto una o più attività, all'interno della finestra F*, tali da renderne il proprio comportamento anomalo rispetto al suo abituale modo di operare. By way of example, suppose that the baseline module 26 has calculated, for the reference time window F *, and after having learned the behavior of a user U * over a sufficiently long previous period, a sufficiently long z-score value raised in modulus to represent an anomaly. We therefore assume that we have received the report of a behavior that differs significantly from that considered normal by the anomaly verification form on history 28. At this point we can highlight and report that the user U * has carried out one or more activities, at inside the window F *, such as to make its behavior anomalous with respect to its usual way of operating.

Ora supponiamo di utilizzare per l'apprendimento il comportamento di tutti gli utenti "simili" ad U*, avvenuto però nella sola finestra F* . Ci concentriamo dunque sul comportamento del gruppo all'interno della sola finestra F*, valutando il comportamento del nostro utente U* in funzione di quello del proprio gruppo di appartenenza, invece che attraverso il suo passato come fatto in precedenza. Now suppose to use for learning the behavior of all users "similar" to U *, which took place only in the F * window. We therefore concentrate on the behavior of the group within the F * window only, evaluating the behavior of our user U * as a function of that of his own group, rather than through his past as done previously.

Anche in questo caso il comportamento dell'utente U* può essere valutato, ma questa volta la baseline è costituita dal comportamento di tutti gli altri utenti a parità di finestra. Questi utenti sono stati scelti solamente tra peer, possibilmente attraverso un algoritmo dedicato di clustering, quindi ci si aspetta una certa omogeneità di comportamenti rispetto al selezionare tutti gli utenti. La risultanza dell'algoritmo di anomaly detection andrà quindi ad indicare quanto è anomalo il comportamento dell'utente U* per il periodo F* rispetto a quanto fatto dai suoi peer nello stesso periodo. Also in this case the behavior of the user U * can be evaluated, but this time the baseline consists of the behavior of all the other users with the same window. These users were chosen only among peers, possibly through a dedicated clustering algorithm, so a certain homogeneity of behavior is expected with respect to selecting all users. The result of the anomaly detection algorithm will therefore indicate how anomalous the behavior of the user U * is for the period F * compared to what his peers did in the same period.

Poiché l'utente U*, nell'esempio che stiamo considerando, è già stato segnalato come anomalo rispetto al suo passato, ora si può confermare questa anomalia anche rispetto al gruppo, oppure no . Since the user U *, in the example we are considering, has already been reported as anomalous with respect to his past, now we can confirm this anomaly also with respect to the group, or not.

Se l'utente U* dovesse apparire relativamente anomalo anche per il gruppo, questa potrebbe essere una conferma del fatto che effettivamente la serie di attività che ha portato l'utente U* ad essere additato come anomalo rispetto al suo stesso passato, allo stesso modo non ha trovato riscontro neppure nel gruppo dei suoi simili . Pertanto, possiamo escludere che sia entrato in campo un nuovo sistema, una nuova procedura oppure un aggiornamento di un workflow aziendale esistente che abbia repentinamente modificato il modo con cui si svolge qualcuna delle attività all'interno dell'azienda. Se così fosse, infatti, certamente questo cambiamento sarebbe stato riscontrato anche in qualche altro peer dell'utente U*, e quindi l'utente U* non sarebbe risultato anomalo anche rispetto al gruppo. If the user U * should appear relatively anomalous for the group as well, this could be a confirmation of the fact that the series of activities that led the user U * to be pointed out as anomalous with respect to his own past, in the same way it has not found a match even in the group of its peers. Therefore, we can exclude that a new system, a new procedure or an update of an existing company workflow has entered the field that has suddenly changed the way in which some of the activities within the company are carried out. If this were the case, in fact, certainly this change would also have been found in some other peer of the user U *, and therefore the user U * would not have been anomalous even with respect to the group.

Al contrario, se l'utente U* dovesse apparire relativamente allineato al gruppo, questa potrebbe essere una conferma del fatto che ci troviamo di fronte ad una "perturbazione" del sistema informatico, delle procedure aziendali o delle modalità operative che hanno avuto ripercussioni sullo svolgimento delle attività lavorative sui sistemi, causando dei "cambi di baseline" diffusi. E' altresì improbabile che molteplici utenti si siano accordati per compiere tutti contemporaneamente una azione fraudolenta. On the contrary, if the user U * should appear relatively aligned with the group, this could be a confirmation of the fact that we are faced with a "disruption" of the information system, of the company procedures or of the operating procedures that have had repercussions on the performance. of work on systems, causing widespread "baseline changes". It is also unlikely that multiple users have agreed to all commit a fraudulent action at the same time.

In questo senso, l'output dell'analisi sulla baseline collettiva può avere una azione di noise reduction rispetto a quanto rilevato dall'analisi sulla baseline individuale, in particolare con un sostanziale abbattimenti di falsi positivi. In this sense, the output of the analysis on the collective baseline can have a noise reduction action compared to what was detected by the analysis on the individual baseline, in particular with a substantial reduction of false positives.

Con riferimento alla figura 2, il funzionamento di una forma di realizzazione del sistema per la creazione e la verifica di baseline comportamentali, ovvero una forma di realizzazione del metodo per la creazione e la verifica di baseline comportamentali, secondo il trovato è descritto nel seguito. With reference to Figure 2, the operation of an embodiment of the system for creating and verifying behavioral baselines, or an embodiment of the method for creating and verifying behavioral baselines, according to the invention is described below.

Inizialmente, al passo 70, il modulo di raccolta eventi grezzi 16 dell'elaboratore centrale 12 colleziona e normalizza i dati grezzi o "raw" sugli eventi registrati dagli apparati target 36 connessi all'elaboratore centrale 12, considerandoli come emissioni di quegli stessi apparati target 36. Initially, at step 70, the raw event collection module 16 of the central processor 12 collects and normalizes the raw or "raw" data on the events recorded by the target devices 36 connected to the central processor 12, considering them as emissions from those same target devices 36.

Al passo 72, preferibilmente contemporaneamente al passo 70, il modulo di raccolta stato IAM 18 dell'elaboratore centrale 12 colleziona i dati sullo stato attuale registrato dall'apparato IAM 38, che gestisce gli account degli utenti e le autorizzazioni di tali account. Così facendo, si ricava dall'apparato ΙAΜ 38 una mappatura aggiornata di tutti gli account dell'utente sugli apparati target 36. At step 72, preferably simultaneously with step 70, the IAM status collection module 18 of the central computer 12 collects data on the current status recorded by the IAM apparatus 38, which manages the user accounts and the authorizations of these accounts. By doing so, an updated mapping of all user accounts on target devices 36 is obtained from device ΙAΜ 38.

Al passo 74, il modulo di arricchimento dati 20 dell'elaboratore centrale 12 incrocia i dati grezzi sugli eventi raccolti al passo 70 e i dati sullo stato IAM raccolto al passo 72, identificando e raggruppando gli eventi grezzi associati ad uno specifico utente, o più in generale di un'entità, indipendentemente dal numero di account di cui lo stesso utente è titolare . At step 74, the data enrichment module 20 of the central processor 12 crosses the raw event data collected at step 70 and the IAM status data collected at step 72, identifying and grouping the raw events associated with a specific user, or more in general of an entity, regardless of the number of accounts owned by the same user.

Gli eventi grezzi contengono sempre delle informazioni sull'utente che ha generato l'emissione. Dipendentemente dall'apparato target 36, queste informazioni sull ' "autore" dell'emissione possono comprendere un indirizzo IP sorgente, un hostname, uno username, oppure una combinazione di queste informazioni. In particolare, lo username può essere sempre ricondotto ad un utente grazie alla mappatura sempre aggiornata prelevata dall'apparato IAM 38. Raw events always contain information about the user who generated the issue. Depending on the target apparatus 36, this information about the "author" of the broadcast may include a source IP address, a hostname, a username, or a combination of this information. In particular, the username can always be traced back to a user thanks to the always updated mapping taken from the IAM 38 device.

Integrando tutte queste informazioni di cui al passo 74, il modulo di arricchimento dati 20 risale all'utente che opera per mezzo di un certo username, che sta utilizzando un certo indirizzo IP oppure che viene tracciato tramite un certo hostname . By integrating all this information referred to in step 74, the data enrichment module 20 goes back to the user who operates by means of a certain username, who is using a certain IP address or who is traced through a certain hostname.

Al passo 76, il modulo markoviano 24 dell'elaboratore centrale 12 costruisce una catena di Markov, o meglio una matrice di transizioni di Markov, atta a tracciare il passaggio da una emissione alla successiva, vale a dire da un'attività a quella temporalmente successiva, entrambe le attività essendo definite dai dati arricchiti e svolte da uno specifico utente fisico sugli apparati target 36. Chiaramente, a seconda di quante emissioni sono considerate contemporaneamente, si avrà una diversa dimensione della matrice. At step 76, the Markov module 24 of the central processor 12 constructs a Markov chain, or rather a matrix of Markov transitions, capable of tracing the passage from one emission to the next, i.e. from one activity to the temporally subsequent one. , both activities being defined by the enriched data and carried out by a specific physical user on the target devices 36. Clearly, depending on how many emissions are considered simultaneously, there will be a different dimension of the matrix.

Il funzionamento del modulo markoviano 24 è già stato descritto nella presente descrizione, e si rimanda ai corrispondenti paragrafi per maggiore dettaglio. The operation of the Markov module 24 has already been described in the present description, and reference is made to the corresponding paragraphs for more detail.

Al passo 78, il modulo di baseline 26 analizza i passaggi tra le attività svolte da uno specifico utente fisico, o più in generale di un'entità, mediante tecniche note di machine learning), a partire dalla matrice di transizioni di Markov per ogni utente u, per ogni finestra temporale f (ad esempio un giorno) . In particolare, il modulo di baseline 26 è configurato per calcolare una pluralità di valori z-score individuali, uno per ogni singola coppia attività e entità (come ad esempio un utente) , e una pluralità di valori zscore collettivi, uno per ogni singola coppia attività e finestra temporale. In pratica, i valori z-score rappresentano la probabilità che un'attività o una sequenza di attività di uno specifico utente fisico, o più in generale di un'entità, costituisca un'anomalia di comportamento. Pertanto, ciò fornisce un indice quantitativo sulla serie di emissioni osservate. At step 78, the baseline module 26 analyzes the steps between the activities carried out by a specific physical user, or more generally of an entity, using known techniques of machine learning), starting from the matrix of Markov transitions for each user u, for each time window f (for example one day). In particular, the baseline module 26 is configured to calculate a plurality of individual z-score values, one for each single activity and entity pair (such as a user, for example), and a plurality of collective zscore values, one for each single pair. activity and time window. In practice, the z-score values represent the probability that an activity or a sequence of activities of a specific physical user, or more generally of an entity, constitutes an anomaly of behavior. Therefore, this provides a quantitative index on the observed series of emissions.

Il funzionamento del modulo di baseline 26 è già stato descritto nella presente descrizione, e si rimanda ai corrispondenti paragrafi per maggiore dettaglio. The operation of the baseline module 26 has already been described in the present description, and reference is made to the corresponding paragraphs for more detail.

Al passo 80, il modulo di verifica anomalia su storico 28 valuta la presenza di un' eventuale anomalia di comportamento di un utente u, o più in generale di un'entità, rispetto allo spazio individuale, ovvero rispetto alla storia delle attività passate dell'utente u, sulla base del valore z-score individuale precedentemente calcolato al passo 78. At step 80, the anomaly on history verification module 28 evaluates the presence of a possible anomaly in the behavior of a user u, or more generally of an entity, with respect to the individual space, or rather with respect to the history of the past activities of the user u, based on the individual z-score previously calculated in step 78.

Se il valore dello z-score individuale supera una soglia predefinita, il comportamento dell'utente u è da ritenersi anomalo rispetto alla storia delle attività passate dell'utente u. If the value of the individual z-score exceeds a predefined threshold, the behavior of user u is to be considered anomalous with respect to the history of the past activities of user u.

Al passo 82, il modulo di verifica anomalia su storico 28 valuta la presenza di un'eventuale anomalia di comportamento di un utente u, o più in generale di un'entità, rispetto allo spazio collettivo, ovvero rispetto alle attività attuali degli altri utenti simili detti peer, sulla base del valore z-score collettivo precedentemente calcolato al passo 78. At step 82, the anomaly on history verification module 28 evaluates the presence of a possible behavior anomaly of a user u, or more generally of an entity, with respect to the collective space, or with respect to the current activities of other similar users said peers, on the basis of the collective z-score previously calculated in step 78.

Se il valore dello z-score collettivo supera una soglia predefinita, il comportamento dell'utente u è da ritenersi anomalo rispetto alle attività attuali dei peer. If the value of the collective z-score exceeds a predefined threshold, the behavior of user u is to be considered anomalous with respect to the current activities of the peers.

In una forma di realizzazione, nell'ambito della valutazione della presenza di un'anomalia, il modulo di verifica anomalia su storico 28 e il modulo di verifica anomalia su peer 30 utilizzano tecniche note di clustering e anomaly detection, come ad esempio Rodriguez Laio o ABOD (Angle-Based Oulier Detection) , che evidenziano le anomalie in questi z-score . In one embodiment, in the context of evaluating the presence of an anomaly, the anomaly check module on history 28 and the anomaly check module on peer 30 use known clustering and anomaly detection techniques, such as Rodriguez Laio or ABOD (Angle-Based Oulier Detection), which highlight the anomalies in these z-scores.

Un utente, o più in generale un'entità, potrebbe risultare anomalo o meno rispetto ai suoi comportamenti passati, a valle dell'analisi di cui il passo 78. In caso di anomalia, ad esso verrebbe anche associato un indice di anomalia. Inoltre, L'utente potrebbe essere l'unico anomalo nella finestra temporale presa in considerazione, oppure potrebbero essercene degli altri all'interno del gruppo dei suoi peer (utenti simili a lui). Questo genera quattro possibili esiti, a seconda di come si combinano i risultati della valutazione di cui i passi 80 e 82. A user, or more generally an entity, could be anomalous or not with respect to its past behavior, downstream of the analysis referred to in step 78. In the event of an anomaly, an anomaly index would also be associated with it. Furthermore, the user could be the only anomalous in the time window taken into consideration, or there could be others within the group of his peers (users similar to him). This generates four possible outcomes, depending on how the assessment results in steps 80 and 82 are combined.

Nel caso di primo esito 84, l'attività dell'utente è allineata rispetto al suo passato e anche rispetto ai peer (anche essi non anomali) nella finestra corrente. Pertanto, in questo caso, l'attività appare assolutamente normale ed in linea con il risultato atteso. In the case of the first outcome 84, the user's activity is aligned with its past and also with respect to the peers (also non-anomalous) in the current window. Therefore, in this case, the activity appears absolutely normal and in line with the expected result.

Nel caso di secondo esito 85, l'attività dell'utente è allineata rispetto al suo passato, ma è anomala rispetto ai peer (che invece sono anomali) nella finestra corrente. Pertanto, in questo caso, molti utenti hanno modificato il proprio comportamento abituale, ma è improbabile che molti utenti contemporaneamente stiano compiendo un'attività fraudolenta. Più probabile, invece, che si tratti di un cambio di procedure oppure un aggiornamento dell'applicazione che ha causato una modifica diffusa dell'operatività nell' utilizzo degli apparati target 36, oppure di uno specifico apparato target 36. L'utente selezionato, che è l'unico a non essere anomalo, probabilmente non si è adeguato al cambiamento imposto agli altri. E' possibile utilizzare questa informazione allo scopo di migliorare la partecipazione attiva e pronta degli utenti ai cambi di procedure, e quindi la sicurezza. In the case of the second outcome 85, the user's activity is aligned with its past, but it is anomalous with respect to the peers (which are instead anomalous) in the current window. Therefore, in this case, many users have changed their usual behavior, but it is unlikely that many users are simultaneously engaging in fraudulent activity. More likely, however, that it is a change of procedures or an update of the application that has caused a widespread change in the operation of the use of the target devices 36, or of a specific target device 36. The selected user, who he is the only one who is not anomalous, he probably did not adapt to the change imposed on others. It is possible to use this information in order to improve the active and prompt participation of users in changes of procedures, and therefore safety.

Nel caso di terzo esito 86, l'attività dell'utente è anomala rispetto al suo passato, ma allineata rispetto ai peer (anche essi anomali) nella finestra corrente. Pertanto, in questo caso, molti utenti hanno modificato il proprio comportamento abituale. Tuttavia, segnalare l'anomalia dell'utente potrebbe risultare in un falso positivo, poiché è improbabile che molti utenti simili contemporaneamente stiano compiendo un'attività fraudolenta. Più probabile, invece, che si tratti di un cambio di procedure oppure un aggiornamento dell'applicazione che ha causato una modifica diffusa dell'operatività nell'utilizzo degli apparati target 36, oppure di uno specifico apparato target 36. Con il terzo esito 86, l'anomalia dell'utente viene considerata mitigata in virtù di una noise reduction. In the case of the third outcome 86, the user's activity is anomalous with respect to its past, but aligned with the peers (also anomalous) in the current window. Therefore, in this case, many users have changed their usual behavior. However, reporting the user anomaly could result in a false positive, as it is unlikely that many similar users are simultaneously engaging in fraudulent activity. On the other hand, it is more likely that it is a change of procedures or an update of the application that has caused a widespread change in the operation of the use of the target devices 36, or of a specific target device 36. With the third outcome 86, the user's anomaly is considered mitigated by virtue of a noise reduction.

Infine, nel caso di quarto esito 87, l'attività dell'utente è anomala rispetto al suo passato e anche rispetto ai propri peer (che invece non sono anomali) nella finestra corrente. Pertanto, l'utente osservato ha eseguito attività anomale rispetto a quelle che lui stesso esegue di solito, mentre gli utenti a lui simili non hanno modificato la propria attività. Questo è lo scenario che più probabilmente merita di essere approfondito e segnalato, poiché potrebbe trattarsi di un comportamento fraudolento o comunque pericoloso per il sistema informatico. Finally, in the case of the fourth outcome 87, the user's activity is anomalous with respect to its past and also with respect to its peers (which are not anomalous) in the current window. Therefore, the observed user performed abnormal activities compared to what he himself usually performs, while similar users did not modify their activity. This is the scenario that most likely deserves to be investigated and reported, as it could be fraudulent or otherwise dangerous behavior for the computer system.

Si è in pratica constatato come il trovato assolva pienamente il compito e gli scopi prefissati. In particolare, si è visto come il sistema e il metodo per la creazione e la verifica di baseline comportamentali così concepito permette di superare i limiti qualitativi dell'arte nota, in quanto consentono di creare e verificare baseline comportamentali basate su attività o sequenze di attività svolte dalle entità (comprendendo sia utenti sia apparati) che operano all'interno di una rete aziendale, al fine di individuare eventuali scostamenti comportamentali che potrebbero rappresentare attività dannose e minacce informatiche. In practice it has been found that the invention fully achieves the intended aim and objects. In particular, we have seen how the system and method for the creation and verification of behavioral baselines thus conceived allows to overcome the qualitative limits of the known art, as they allow to create and verify behavioral baselines based on activities or sequences of activities. carried out by entities (including both users and equipment) operating within a corporate network, in order to identify any behavioral deviations that could represent harmful activities and cyber threats.

Un vantaggio del sistema e del metodo per la creazione e la verifica di baseline comportamentali secondo il presente trovato consiste nel fatto che essi consentono un metodo di rilevamento delle minacce informatiche dinamico, adattativo e proattivo, al fine di contrastare le minacce esterne ed interne in continua evoluzione e ignote a priori. An advantage of the system and method for creating and verifying behavioral baselines according to the present invention consists in the fact that they allow a dynamic, adaptive and proactive cyber threat detection method, in order to counteract external and internal threats continuously. evolution and unknown a priori.

Un altro vantaggio del sistema e del metodo per la creazione e la verifica di baseline comportamentali secondo il presente trovato consiste nel fatto che essi consentono un metodo di rilevamento delle anomalie comportamentali che non sia guidato da firme o da policy precostituite, ma che si adatti al comportamento delle entità e all'utilizzo dei sistemi informativi aziendali. Another advantage of the system and method for creating and verifying behavioral baselines according to the present invention consists in the fact that they allow a method for detecting behavioral anomalies that is not guided by pre-established signatures or policies, but which adapts to the behavior of entities and the use of company information systems.

Un ulteriore vantaggio del sistema e del metodo per la creazione e la verifica di baseline comportamentali secondo il presente trovato consiste nel fatto che essi minimizzano i falsi positivi, aumentando così l'efficacia delle azioni di risposta ad un attacco o una minaccia informatica . A further advantage of the system and method for creating and verifying behavioral baselines according to the present invention consists in the fact that they minimize false positives, thus increasing the effectiveness of the response actions to an attack or a cyber threat.

Ancora, vantaggio del sistema e del metodo per la creazione e la verifica di baseline comportamentali secondo il presente trovato consiste nel fatto che essi non si limitano ad effettuare correlazioni sulle attività degli account, ma effettuano correlazioni sulle attività del singolo utente fisico, indipendentemente dal numero di account assegnati al medesimo. Furthermore, the advantage of the system and of the method for creating and verifying behavioral baselines according to the present invention consists in the fact that they are not limited to carrying out correlations on the activities of the accounts, but carry out correlations on the activities of the individual physical user, regardless of the number of accounts assigned to the same.

Benché il sistema e il metodo per la creazione e la verifica di baseline comportamentali secondo il trovato siano stati concepiti in particolare per incrementare la sicurezza informatica di sistemi e reti aziendali o infrastrutturali, essi potranno comunque essere utilizzati, più generalmente, per incrementare la sicurezza informatica di sistemi e reti di qualsiasi entità di medie o grandi dimensioni. Although the system and method for creating and verifying behavioral baselines according to the invention have been conceived in particular to increase the IT security of corporate or infrastructure systems and networks, they can still be used, more generally, to increase IT security. of systems and networks of any medium or large entity.

Il trovato, così concepito, è suscettibile di numerose modifiche e varianti, tutte rientranti nell'ambito del concetto inventivo. Inoltre, tutti i dettagli potranno essere sostituiti da altri elementi tecnicamente equivalenti . The invention thus conceived is susceptible of numerous modifications and variations, all of which are within the scope of the inventive concept. Furthermore, all the details can be replaced by other technically equivalent elements.

In pratica, i materiali impiegati, purché compatibili con l'uso specifico, nonché le dimensioni e le forme contingenti potranno essere qualsiasi a seconda delle esigenze e dello stato della tecnica. In practice, the materials employed, so long as they are compatible with the specific use, as well as the contingent shapes and dimensions, may be any according to requirements and to the state of the art.

In conclusione, l'ambito di protezione delle rivendicazioni non deve essere limitato dalle illustrazioni o dalle forme di realizzazione preferite illustrate nella descrizione sotto forma di esempi, ma piuttosto le rivendicazioni devono comprendere tutte le caratteristiche di novità brevettabile che risiedono nella presente invenzione, incluse tutte le caratteristiche che sarebbero trattate come equivalenti dal tecnico del ramo. In conclusion, the scope of the claims must not be limited by the illustrations or preferred embodiments illustrated in the description in the form of examples, but rather the claims must encompass all of the patentable novelty features that reside in the present invention, including all the characteristics that would be treated as equivalent by the person skilled in the art.

Claims

1. System (10) for creating and verifying behavioral baselines, comprising a central processing device (12) comprising a control unit (14) and enriched data storage means (22), connected and communicating with a plurality of target apparatuses (36) and with an Identity & Access Management or IAM apparatus (38), characterized in that said central processing device (12) comprises: - a Markov module (24), configured to construct a matrix of Markov transitions capable of tracing the transition from a first activity to a second activity temporally subsequent, both said activities being defined by enriched data and carried out by an entity on said devices target (36); - a baseline module (26), configured to calculate a plurality of individual z-score values, one for each single activity and entity pair, and a plurality of collective z-score values, one for each single activity and time window pair; - a historical anomaly verification module (28) configured to assess the presence of an anomaly in the behavior of said entity with respect to an individual space, or with respect to the history of the past activities of said entity, on the basis of said plurality of z values - individual scores; And - a peer anomaly verification module (30), configured to evaluate the presence of an anomaly in the behavior of said entity with respect to a collective space, or with respect to the current activities of other similar peer entities, on the basis of said plurality of values collective zscore. 2. System (10) for the creation and verification of behavioral baselines according to claim 1, characterized in that said individual z-score value is calculated according to the formula:

where is it :

3. System (10) for the creation and verification of behavioral baselines according to claim 1 or 2, characterized in that said collective z-score value is calculated according to the formula: where is it

System (10) for the creation and verification of behavioral baselines according to any one of the preceding claims, characterized in that said central processing device (12) further comprises a raw event collection module (16) configured to collect and normalize raw data on the events recorded by said target devices (36).

5. System (10) for the creation and verification of behavioral baselines according to any one of the preceding claims, characterized in that said central processing device (12) further comprises an IAM status collection module (18) configured to collect data on the current status registered by said IAM apparatus (38), obtaining an updated mapping of all the accounts of said entity on said target apparatuses (36).

System (10) for the creation and verification of behavioral baselines according to any one of the preceding claims, characterized in that said central processing device (12) further comprises a data enrichment module (20) configured to cross said raw data on events, collected by said raw event collection module (16), and said IAM status data, collected by said IAM status collection module (18), identifying and grouping raw events associated with said entity regardless of the number of accounts of which that entity is the owner.

7. Method for creating and verifying behavioral baselines, by means of a central processing device (12) comprising a control unit (14) and enriched data storage means (22), connected and in communication with a plurality of apparatuses target (36) and with an Identity & Access Management or IAM (38) apparatus, comprising the steps which consist in: - construct (76) a matrix of Markov transitions capable of tracing the transition from a first activity to a second temporally subsequent activity, by means of a Markov module (24) included in said central processing device (12), both said activities being defined from said data enriched and carried out by an entity on said target devices (36); - calculate (78) a plurality of individual zscore values, one for each single activity and entity pair, and a plurality of collective zscore values, one for each single activity and time window pair, by means of a baseline module (26) included in said central processing device (12); - assess (80) the presence of a behavioral anomaly of said entity with respect to an individual space, or with respect to the history of the past activities of said entity, on the basis of said plurality of individual z-score values, by means of a verification module anomaly on history (28) included in said central processing device (12); And - assess (82) the presence of an anomaly in the behavior of said entity with respect to a collective space, or with respect to the current activities of other similar peer entities, on the basis of said plurality of collective z-score values, through a verification module anomaly on peer (30) included in said central processing device (12).

8. Method for creating and verifying behavioral baselines according to claim 7, characterized in that said individual zscore value is calculated according to the formula: where is it :

9. Method for the creation and verification of behavioral baselines according to claim 7 or 8, characterized in that said collective z-score value is calculated according to the formula: where is it

10. Method for creating and verifying behavioral baselines according to any one of claims 7 to 9, characterized in that it comprises the step which consists in crossing (74) said event data, collected by said raw event collection module (16), and said IAM status data, collected by said IAM status collection module (18), through a data enrichment module (20) included in said central processing device (12), identifying and grouping events associated with said entity regardless of the number of accounts that entity owns.

11. Method for the creation and verification of behavioral baselines according to any one of claims 7 to 10, characterized by the fact of understanding the step which consists in collecting and normalizing (70) raw data on the events recorded by said target apparatuses (36) , through a raw event collection module (16) included in said central processing device (12).

12. Method for creating and verifying behavioral baselines according to any one of claims 7 to 11, characterized in that it comprises the step which consists in collecting (72) data on the current state recorded by said IAM apparatus (38), by means of an IAM status collection module (18) included in said central processing device (12), obtaining an updated mapping of all the accounts of said entity on said target apparatuses (36).