EP4397002A1 - Génération de polygraphes spécifiques à l'utilisateur pour une activité de réseau - Google Patents
Génération de polygraphes spécifiques à l'utilisateur pour une activité de réseauInfo
- Publication number
- EP4397002A1 EP4397002A1 EP22778115.0A EP22778115A EP4397002A1 EP 4397002 A1 EP4397002 A1 EP 4397002A1 EP 22778115 A EP22778115 A EP 22778115A EP 4397002 A1 EP4397002 A1 EP 4397002A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- user
- nodes
- polygraph
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/552—Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/554—Detecting local intrusion or implementing counter-measures involving event detection and direct action
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
Definitions
- Fig. 1 A shows an illustrative configuration in which a data platform is configured to perform various operations with respect to a cloud environment that includes a plurality of compute assets.
- Fig. IB shows an illustrative implementation of the configuration of Fig. 1A.
- FIG. 1C illustrates an example computing device.
- Fig. ID illustrates an example of an environment in which activities that occur within datacenters are modeled.
- FIG. 2A illustrates an example of a process, used by an agent, to collect and report information about a client.
- Fig. 2B illustrates a 5-tuple of data collected by an agent, physically and logically.
- Fig. 2C illustrates a portion of a polygraph.
- Fig. 2D illustrates a portion of a polygraph
- Fig. 2E illustrates an example of a communication polygraph.
- Fig. 2F illustrates an example of a polygraph.
- Fig. 2G illustrates an example of a polygraph as rendered in an interface.
- Fig. 2H illustrates an example of a portion of a polygraph as rendered in an interface.
- Fig. 21 illustrates an example of a portion of a polygraph as rendered in an interface.
- Fig. 2M illustrates an example of a privilege change graph as rendered in an interface.
- Fig. 20 illustrates an example of a machine server graph as rendered in an interface.
- Fig. 3A illustrates an example of a process for detecting anomalies in a network environment.
- Fig. 3B depicts a set of example processes communicating with other processes.
- Fig. 3H illustrates an example of a process for performing extended user tracking.
- Fig. 31 is a representation of a user logging into a first machine, then into a second machine from the first machine, and then making an external connection.
- Fig. 3K illustrates example records.
- Fig. 3L illustrates example output from performing an ssh connection match.
- Fig. 30 illustrates example records.
- Fig. 4C illustrates an embodiment of a portion of an insider behavior graph.
- Fig. 17 sets forth an example of a user-specific polygraph in accordance with some embodiments of the present disclosure.
- Compute assets 16 may include, but are not limited to, containers (e.g., container images, deployed and executing container instances, etc.), virtual machines, workloads, applications, processes, physical machines, compute nodes, clusters of compute nodes, software runtime environments (e.g., container runtime environments), and/or any other virtual and/or physical compute resource that may reside in and/or be executed by one or more computer resources in cloud environment 14.
- containers e.g., container images, deployed and executing container instances, etc.
- virtual machines e.g., container images, deployed and executing container instances, etc.
- workloads e.g., applications, processes, physical machines, compute nodes, clusters of compute nodes
- software runtime environments e.g., container runtime environments
- any other virtual and/or physical compute resource may reside in and/or be executed by one or more computer resources in cloud environment 14.
- one or more compute assets 16 may reside in one or more datacenters.
- Data platform 12 may be configured to perform one or more data security monitoring and/or remediation services, compliance monitoring services, anomaly detection services, DevOps services, compute asset management services, and/or any other type of data analytics service as may serve a particular implementation.
- Data platform 12 may be managed or otherwise associated with any suitable data platform provider, such as a provider of any of the data analytics services described herein.
- the various resources included in data platform 12 may reside in the cloud and/or be located on-premises and be implemented by any suitable combination of physical and/or virtual compute resources, such as one or more computing devices, microservices, applications, etc.
- a data warehouse may be embodied as an analytic database (e.g., a relational database) that is created from two or more data sources. Such a data warehouse may be leveraged to store historical data, often on the scale of petabytes. Data warehouses may have compute and memory resources for running complicated queries and generating reports. Data warehouses may be the data sources for business intelligence (‘BI’) systems, machine learning applications, and/or other applications. By leveraging a data warehouse, data that has been copied into the data warehouse may be indexed for good analytic query performance, without affecting the write performance of a database (e.g., an Online Transaction Processing (‘OLTP’) database). Data warehouses also enable the joining data from multiple sources for analysis. For example, a sales OLTP application probably has no need to know about the weather at various sales locations, but sales predictions could take advantage of that data. By adding historical weather data to a data warehouse, it would be possible to factor it into models of historical sales data.
- OLTP Online Transaction Processing
- Data processing resources 20 may be configured to perform various data processing operations with respect to data ingested by data ingestion resources 18, including data ingested and stored in data store 30.
- data processing resources 20 may be configured to perform one or more data security monitoring and/or remediation operations, compliance monitoring operations, anomaly detection operations, DevOps operations, compute asset management operations, and/or any other type of data analytics operation as may serve a particular implementation.
- data security monitoring and/or remediation operations may be configured to perform various data security monitoring and/or remediation operations, compliance monitoring operations, anomaly detection operations, DevOps operations, compute asset management operations, and/or any other type of data analytics operation as may serve a particular implementation.
- DevOps operations compute asset management operations
- any other type of data analytics operation as may serve a particular implementation.
- one or more operations performed by data processing resources 20 may be performed in substantially real-time (or near real-time) as data is ingested into data platform 12.
- the results of such operations e.g., one or more detected anomalies in the data
- may be provided to one or more external entities e g., computing device 24 and/or one or more users in substantially real-time and/or in near real-time.
- Long term storage 42 with which data ingestion resources may interface, as illustrated by arrow 44.
- Long term storage 42 may be implemented by any suitable type of storage resources, such as cloud-based storage (e.g., AWS S3, etc.) and/or on-premises storage and may be used by data ingestion resources 18 as part of the data ingestion process. Examples of this are described herein.
- data platform 12 may not utilize long term storage 42.
- a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task.
- processor refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
- a non-transitory computer-readable medium storing computer-readable instructions may be provided in accordance with the principles described herein.
- the instructions when executed by a processor of a computing device, may direct the processor and/or computing device to perform one or more operations, including one or more of the operations described herein.
- Such instructions may be stored and/or transmitted using any of a variety of known computer-readable media.
- a non-transitory computer-readable medium as referred to herein may include any non- transitory storage medium that participates in providing data (e.g., instructions) that may be read and/or executed by a computing device (e.g., by a processor of a computing device).
- a non-transitory computer-readable medium may include, but is not limited to, any combination of non-volatile storage media and/or volatile storage media.
- Exemplary non-volatile storage media include, but are not limited to, read-only memory, flash memory, a solid-state drive, a magnetic storage device (e.g.
- RAM ferroelectric random-access memory
- optical disc e.g., a compact disc, a digital video disc, a Blu-ray disc, etc
- RAM e.g., dynamic RAM
- FIG. 1C illustrates an example computing device 50 that may be specifically configured to perform one or more of the processes described herein. Any of the systems, microservices, computing devices, and/or other components described herein may be implemented by computing device 50.
- computing device 50 may include a communication interface 52, a processor 54, a storage device 56, and an input/output (“VO”) module 58 communicatively connected one to another via a communication infrastructure 60. While an exemplary computing device 50 is shown in Fig. 1C, the components illustrated in Fig. 1C are not intended to be limiting. Additional or alternative components may be used in other embodiments. Components of computing device 50 shown in Fig. 1C will now be described in additional detail.
- Communication interface 52 may be configured to communicate with one or more computing devices.
- Examples of communication interface 52 include, without limitation, a wired network interface (such as a network interface card), a wireless network interface (such as a wireless network interface card), a modem, an audio/video connection, and any other suitable interface.
- Processor 54 generally represents any type or form of processing unit capable of processing data and/or interpreting, executing, and/or directing execution of one or more of the instructions, processes, and/or operations described herein. Processor 54 may perform operations by executing computer-executable instructions 62 (e.g., an application, software, code, and/or other executable data instance) stored in storage device 56.
- Storage device 56 may include one or more data storage media, devices, or configurations and may employ any type, form, and combination of data storage media and/or device. For example, storage device 56 may include, but is not limited to, any combination of the non-volatile media and/or volatile media described herein.
- Fig. ID illustrates an example implementation 100 of configuration 10.
- one more components shown in Fig. ID may implement one or more components shown in Fig. 1 A and/or Fig. IB.
- implementation 100 illustrates an environment in which activities that occur within datacenters are modeled using data platform 12.
- a baseline of datacenter activity can be modeled, and deviations from that baseline can be identified as anomalous.
- Anomaly detection can be beneficial in a security context, a compliance context, an asset management context, a DevOps context, and/or any other data analytics context as may serve a particular implementation.
- a datacenter may include dedicated equipment (e.g., owned and operated by entity A, or owned/leased by entity A and operated exclusively on entity A’ s behalf by a third party).
- a datacenter can also include cloudbased resources, such as infrastructure as a service (laaS), platform as a service (PaaS), and/or software as a service (SaaS) elements.
- laaS infrastructure as a service
- PaaS platform as a service
- SaaS software as a service
- the techniques described herein can be used in conjunction with multiple types of datacenters, including ones wholly using dedicated equipment, ones that are entirely cloud-based, and ones that use a mixture of both dedicated equipment and cloud-based resources.
- Both datacenter 104 and datacenter 106 include a plurality of nodes, depicted collectively as set of nodes 108 and set of nodes 110, respectively, in Fig. ID. These nodes may implement compute assets 16. Installed on each of the nodes are in-server / in-virtual machine (VM) / embedded in loT device agents (e.g., agent 112), which are configured to collect data and report it to data platform 12 for analysis. As described herein, agents may be small, self- contained binaries that can be run on any appropriate platforms, including virtualized ones (and, as applicable, within containers). Agents may monitor the nodes on which they execute for a variety of different activities, including: connection, process, user, machine, and file activities.
- VM in-virtual machine
- loT device agents e.g., agent 112
- agents may be small, self- contained binaries that can be run on any appropriate platforms, including virtualized ones (and, as applicable, within containers). Agents may monitor the nodes on which they execute for
- Agents can be executed in user space, and can use a variety of kernel modules (e.g., auditd, iptables, netfilter, pcap, etc.) to collect data. Agents can be implemented in any appropriate programming language, such as C or Golang, using applicable kernel APIs.
- kernel modules e.g., auditd, iptables, netfilter, pcap, etc.
- Agents can be implemented in any appropriate programming language, such as C or Golang, using applicable kernel APIs.
- agents can selectively report information to data platform 12 in varying amounts of detail and/or with variable frequency.
- the data collected by agents may be used by data platform 12 to create polygraphs, which are graphs of logical entities, connected by behaviors.
- agents report information directly to data platform 12.
- at least some agents provide information to a data aggregator, such as data aggregator 114, which in turn provides information to data platform 12.
- a data aggregator can be implemented as a separate binary or other application (distinct from an agent binary), and can also be implemented by having an agent execute in an “aggregator mode” in which the designated aggregator node acts as a Layer 7 proxy for other agents that do not have access to data platform 12. Further, a chain of multiple aggregators can be used, if applicable (e.g., with agent 112 providing data to data aggregator 114, which in turn provides data to another aggregator (not pictured) which provides data to data platform 12).
- An example way to implement an aggregator is through a program written in an appropriate language, such as C or Golang.
- Use of an aggregator can be beneficial in sensitive environments (e.g., involving financial or medical transactions) where various nodes are subject to regulatory or other architectural requirements (e.g., prohibiting a given node from communicating with systems outside of datacenter 104). Use of an aggregator can also help to minimize security exposure more generally. As one example, by limiting communications with data platform 12 to data aggregator 114, individual nodes in nodes 108 need not make external network connections (e.g., via Internet 124), which can potentially expose them to compromise (e.g., by other external devices, such as device 118, operated by a criminal). Similarly, data platform 12 can provide updates, configuration information, etc., to data aggregator 114 (which in turn distributes them to nodes 108), rather than requiring nodes 108 to allow incoming connections from data platform 12 directly.
- data platform 12 can provide updates, configuration information, etc., to data aggregator 114 (which in turn distributes them to nodes 108), rather than requiring nodes 108 to allow incoming
- Another benefit of an aggregator model is that network congestion can be reduced (e.g., with a single connection being made at any given time between data aggregator 114 and data platform 12, rather than potentially many different connections being open between various of nodes 108 and data platform 12). Similarly, network consumption can also be reduced (e.g., with the aggregator applying compression techniques/bundling data received from multiple agents).
- agent 112 installed on node 116
- data serialization protocols such as Apache Avro.
- agent 112 installed on node 116
- status information is sent by an agent periodically (e.g., once an hour or once any other predetermined amount of time). Alternatively, status information may be sent continuously or in response to occurrence of one or more events.
- the status information may include, but is not limited to, a. an amount of event backlog (in bytes) that has not yet been transmitted, b. configuration information, c. any data loss period for which data was dropped, d. a cumulative count of errors encountered since the agent started, e. version information for the agent binary, and/or f. cumulative statistics on data collection (e.g., number of network packets processed, new processes seen, etc.).
- a second example type of information that may be sent by agent 112 to data aggregator 114 is event data (described in more detail herein), which may include a UTC timestamp for each event.
- the agent can control the amount of data that it sends to the data aggregator in each call (e.g., a maximum of 10MB) by adjusting the amount of data sent to manage the conflicting goals of transmitting data as soon as possible, and maximizing throughput.
- Data can also be compressed or uncompressed by the agent (as applicable) prior to sending the data.
- Each data aggregator may run within a particular customer environment.
- a data aggregator (e.g., data aggregator 114) may facilitate data routing from many different agents (e.g., agents executing on nodes 108) to data platform 12.
- data aggregator 114 may implement a SOCKS 5 caching proxy through which agents can connect to data platform 12.
- data aggregator 114 can encrypt (or otherwise obfuscate) sensitive information prior to transmitting it to data platform 12, and can also distribute key material to agents which can encrypt the information (as applicable).
- Data aggregator 114 may include a local storage, to which agents can upload data (e.g., pcap packets). The storage may have a key-value interface.
- the local storage can also be omitted, and agents configured to upload data to a cloud storage or other storage area, as applicable.
- Data aggregator 114 can, in some embodiments, also cache locally and distribute software upgrades, patches, or configuration information (e.g., as received from data platform 12).
- user A may access a web frontend (e.g., web app 120) using a computer 126 and enrolls (on behalf of entity A) an account with data platform 12. After enrollment is complete, user A may be presented with a set of installers, pre-built and customized for the environment of entity A, that user A can download from data platform 12 and deploy on nodes 108.
- a web frontend e.g., web app 120
- installers include, but are not limited to, a Windows executable file, an iOS app, a Linux package (e.g., deb or .rpm), a binary, or a container (e.g., a Docker container).
- user B e.g., a network administrator
- user B may be similarly presented with a set of installers that are pre-built and customized for the environment of entity B .
- the agent may be installed in the user space (i.e., is not a kernel module), and the same binary is executed on each node of the same type (e.g., all Windowsbased platforms have the same Windows-based binary installed on them).
- An illustrative function of an agent, such as agent 112 is to collect data (e g., associated with node 116) and report it (e.g., to data aggregator 114).
- Other tasks that can be performed by agents include data configuration and upgrading.
- One approach to collecting data as described herein is to collect virtually all information available about a node (and, e.g., the processes running on it).
- the agent may monitor for network connections, and then begin collecting information about processes associated with the network connections, using the presence of a network packet associated with a process as a trigger for collecting additional information about the process.
- an application such as a calculator application, which does not typically interact with the network
- no information about use of that application may be collected by agent 112 and/or sent to data aggregator 114.
- agent 112 may collect information about the process and provide associated information to data aggregator 114.
- the agent may always collect/report information about certain events, such as privilege escalation, irrespective of whether the event is associated with network activity.
- An approach to collecting information is as follows, and described in conjunction with process 200 depicted in Fig. 2A.
- An agent e.g., agent 112 monitors its node (e.g., node 116) for network activity.
- agent 112 can monitor node 116 for network activity is by using a network packet capture tool (e.g., listening using libpcap).
- the agent obtains and maintains (e.g., in an in-memory cache) connection information associated with the network activity (202). Examples of such information include DNS query/response, TCP, UDP, and IP information.
- the agent may also determine a process associated with the network connection (203).
- a kernel network diagnostic API e.g., netlink_diag
- netstat e.g., on /proc/net/tcp, /proc/net/tcp6, /proc/net/udp, and /proc/net/udp6
- Information such as socket state (e g., whether a socket is connected, listening, etc.) can also be collected by the agent.
- One way an agent can obtain a mapping between a given inode and a process identifier is to scan within the /proc/pid directory. For each of the processes currently running, the agent examines each of their file descriptors. If a file descriptor is a match for the inode, the agent can determine that the process associated with the file descriptor owns the inode. Once a mapping is determined between an inode and a process identifier, the mapping is cached. As additional packets are received for the connection, the cached process information is used (rather than a new search being performed).
- Another example of an optimization is to prioritize searching the file descriptors of certain processes over others.
- One such prioritization is to search through the subdirectories of /proc/ starting with the youngest process.
- One approximation of such a sort order is to search through /proc/ in reverse order (e.g., examining highest numbered processes first). Higher numbered processes are more likely to be newer (i.e., not long-standing processes), and thus more likely to be associated with new connections (i.e., ones for which inode-process mappings are not already cached).
- the most recently created process may not have the highest process identifier (e.g., due to the kernel wrapping through process identifiers).
- Another example prioritization is to query the kernel for an identification of the most recently created process and to search in a backward order through the directories in /proc/ (e.g., starting at the most recently created process and working backwards, then wrapping to the highest value (e.g., 32768) and continuing to work backward from there).
- An alternate approach is for the agent to keep track of the newest process that it has reported information on (e g., to data aggregator 114), and begin its search of /proc/ in a forward order starting from the PID of that process.
- Another example prioritization is to maintain, for each user actively using node 116, a list of the five (or any other number) most recently active processes. Those processes are more likely than other processes (less active, or passive) on node 116 to be involved with new connections, and can thus be searched first. For many processes, lower valued file descriptors tend to correspond to non-sockets (e g., stdin, stdout, stderr). Yet another optimization is to preferentially search higher valued file descriptors (e.g., across processes) over lower valued file descriptors (that are less likely to yield matches).
- an agent may encounter a socket that does not correspond to the inode being matched against and is not already cached. The identity of that socket (and its corresponding inode) can be cached, once discovered, thus removing a future need to search for that pair.
- the collected information is then transmitted (205), e.g., by an agent (e.g., agent 112) to a data aggregator (e.g., data aggregator 114), which in turn provides the information to data platform 12.
- agent e.g., agent 112
- data aggregator e.g., data aggregator 114
- all information collected by an agent may be transmitted (e.g., to a data aggregator and/or to data platform 12).
- the amount of data transmitted may be minimized (e.g., for efficiency reasons), using various techniques.
- Constant data can be transmitted (210) once, when the agent first becomes aware of the process. And, if any changes to the constant data are detected (e.g., a process changes its parent), a refreshed version of the data can be transmitted (210) as applicable.
- variable data e.g., data that may change over the lifetime of the process.
- variable data can be transmitted (210) at periodic (or other) intervals.
- variable data may be transmitted in substantially real time as it is collected.
- the variable data may indicate a thread count for a process, a total virtual memory used by the process, the total resident memory used by the process, the total time spent by the process executing in user space, and/or the total time spent by the process executing in kernel space.
- the data may include a hash that may be used within data platform 12 to join process creation time attributes with runtime attributes to construct a full dataset.
- Core User Data user name, UID (user ID), primary group, other groups, home directory.
- Failed Login Data IP address, hostname, username, count.
- User Login Data user name, hostname, IP address, start time, TTY (terminal), UID (user ID), GID (group ID), process, end time. [00138] 2. Machine Data
- Dropped Packet Data source IP address, destination IP address, destination port, protocol, count.
- Machine Data hostname, domain name, architecture, kernel, kernel release, kernel version, OS, OS version, OS description, CPU, memory, model number, number of cores, last boot time, last boot reason, tags (e g., Cloud provider tags such as AWS, GCP, or Azure tags), default router, interface name, interface hardware address, interface IP address and mask, promiscuous mode.
- tags e g., Cloud provider tags such as AWS, GCP, or Azure tags
- Network Connection Data source IP address, destination IP address, source port, destination port, protocol, start time, end time, incoming and outgoing bytes, source process, destination process, direction of connection, histograms of packet length, inter packet delay, session lengths, etc.
- Listening Ports in Server source IP address, port number, protocol, process.
- Dropped Packet Data source IP address, destination IP address, destination port, protocol, count.
- Arp Data source hardware address, source IP address, destination hardware address, destination IP address.
- an agent such as agent 112 can be deployed in a container (e.g., a Docker container), and can also be used to collect information about containers. Collection about a container can be performed by an agent irrespective of whether the agent is itself deployed in a container or not (as the agent can be deployed in a container running in a privileged mode that allows for monitoring).
- agents are also configured to maintain histogram data for a given network connection, and provide the histogram data (e.g., in the Apache Avro data exchange format) under the Connection event type data.
- histograms include: 1. a packet length histogram (packet_len_hist), which characterizes network packet distribution 2. a session length histogram (session_len_hist), which characterizes a network session length 3. a session time histogram (session_time_hist), which characterizes a network session time and 4. a session switch time histogram (session_switch_time_hist), which characterizes network session switch time (i.e., incoming->outgoing and vice versa).
- data aggregator 114 may be configured to provide information (e.g., collected from nodes 108 by agents) to data platform 12.
- Data aggregator 128 may be similarly configured to provide information to data platform 12.
- both aggregator 114 and aggregator 128 may connect to a load balancer 130, which accepts connections from aggregators (and/or as applicable, agents), as well as other devices, such as computer 126 (e.g., when it communicates with web app 120), and supports fair balancing.
- portions of the 5-tuple may change - potentially frequently - but still be associated with the same behavior.
- one application e.g., Apache
- another application e.g., Oracle
- either/both of Apache and Oracle may be multi-homed.
- This can lead to potentially thousands of 5-tuples (or more) that all correspond to Apache communicating with Oracle within a datacenter.
- Apache could be executed on a single machine, and could also be executed across fifty machines, which are variously spun up and down (with different IP addresses each time).
- An alternate representation of the 5-tuple of data 210 is depicted in region 217, and is logical.
- the logical representation of the 5-tuple aggregates the 5-tuple (along with other connections between Apache and Oracle having other 5- tuples) as logically representing the same connection.
- Fig. 2C depicts a portion of a logical polygraph.
- a datacenter has seven instances of the application update_engine 225, executing as seven different processes on seven different machines, having seven different IP addresses, and using seven different ports.
- the instances of update_engine variously communicate with update.core-os.net 226, which may have a single IP address or many IP addresses itself, over the one hour time period represented in the polygraph.
- update_engine is a client, connecting to the server update.core-os.net, as indicated by arrow 228.
- polygraph user B an administrator of datacenter 106
- data platform 12 to view visualizations of polygraphs in a web browser (e.g., as served to user B via web app 120).
- polygraph user B can view is an applicationcommunication polygraph, which indicates, for a given one hour window (or any other suitable time interval), which applications communicated with which other applications.
- polygraph user B can view is an application launch polygraph.
- the first cluster includes (for the 9am- 10am window) five members (261) and the second cluster includes (for the same window) eight members (262).
- the reason for these two distinct clusters is that the two groups of applications behave differently (e.g., they exhibit two distinct sets of communication patterns).
- the instances of etcd2 in cluster 261 only communicate with locksmithctl (263) and other etcd2 instances (in both clusters 261 and 262).
- the instances of etcd2 in cluster 262 communicate with additional entities, such as etcdctl and Docker containers.
- user B can click on one of the clusters (e.g., cluster 261) and be presented with summary information about the applications included in the cluster, as is shown in Fig. 21 (e.g., in region 265).
- User B can also double click on a given cluster (e.g., cluster 261) to see details on each of the individual members of the cluster broken out.
- Fig. 2K illustrates another example of a portion of an application launch polygraph.
- user B has searched (270) for “python ma” to see how “python marathon_lb” (271) is launched.
- python marathonjb is launched as a result of a chain of the same seven applications each time. If python marathonjb is ever launched in a different manner, this indicates anomalous behavior.
- the behavior could be indicative of malicious activities, but could also be due to other reasons, such as a misconfiguration, a performance-related issue, and/or a failure, etc.
- the administrator connects to a second virtual machine (e.g., using the same credentials), then uses the sudo command to change identities to those of another user, and then launches a program, graphs built by data platform 12 can be used to associate the administrator with each of his actions, including launching the program using the identity of another user.
- a second virtual machine e.g., using the same credentials
- Fig. 2M illustrates an example of a portion of a privilege change graph, which identifies how privileges are changed between processes.
- a process e.g., “Is”
- the process inherits the same privileges that the user has.
- Information included in the privilege change graph can be determined by examining the parent of each running process, and determining whether there is a match in privilege between the parent and the child. If the privileges are different, a privilege change has occurred (whether a change up or a change down).
- the application ntpd is one rare example of a scenario in which a process escalates (272) to root, and then returns back (273).
- the sudo command is another example (e.g., used by an administrator to temporarily have a higher privilege).
- ntpd s privilege change actions, and the legitimate actions of various administrators (e.g., using sudo) will be incorporated into a baseline model by data platform 12.
- deviations occur, such as where a new application that is not ntpd escalates privilege, or where an individual that has not previously/ does not routinely use sudo does so, such behaviors can be identified as anomalous.
- FIG. 2N illustrates an example of a portion of a user login graph, which identifies which users log into which logical nodes.
- Physical nodes (whether bare metal or virtualized) are clustered into a logical machine cluster, for example, using yet another graph, a machine-server graph, an example of which is shown in Fig. 20.
- a determination is made as to what type of machine it is, based on what kind(s) of workflows it runs.
- some machines run as master nodes (having a typical set of workflows they run, as master nodes) and can thus be clustered as master nodes.
- Worker nodes are different from master nodes, for example, because they run Docker containers, and frequently change as containers move around. Worker nodes can similarly be clustered.
- the polygraph depicted in Fig. 2E corresponds to activities in a datacenter in which, in a given hour, approximately 500 virtual machines collectively run one million processes, and make 100 million connections in that hour.
- the polygraph represents a drastic reduction in size (e.g., from tracking information on 100 million connections in an hour, to a few hundred nodes and a few hundred edges).
- a polygraph can be constructed (e.g., using commercially available computing infrastructure) in less than an hour (e g., within a few minutes).
- ongoing hourly snapshots of a datacenter can be created within a two hour moving window (i.e., collecting data for the time period 8am-9am, while also generating a snapshot for the time previous time period 7am-8am).
- the following describes various example infrastructure that can be used in polygraph construction, and also describes various techniques that can be used to construct polygraphs.
- embodiments of data platform 12 may be built using any suitable infrastructure as a service (laaS) (e.g., AWS).
- laaS e.g., AWS
- data platform 12 can use Simple Storage Service (S3) for data storage, Key Management Service (KMS) for managing secrets, Simple Queue Service (SQS) for managing messaging between applications, Simple Email Service (SES) for sending emails, and Route 53 for managing DNS.
- S3 Simple Storage Service
- KMS Key Management Service
- SQL Simple Queue Service
- SES Simple Email Service
- Route 53 for managing DNS.
- Other infrastructure tools can also be used.
- Examples include: orchestration tools (e.g., Kubernetes or Mesos/Marathon), service discovery tools (e.g., Mesos-DNS), service load balancing tools (e g., marathon-LB), container tools (e.g., Docker or rkt), log/metric tools (e.g., collectd, fluentd, kibana, etc.), big data processing systems (e.g., Spark, Hadoop, AWS Redshift, Snowflake etc.), and distributed key value stores (e g., Apache Zookeeper or etcd2).
- orchestration tools e.g., Kubernetes or Mesos/Marathon
- service discovery tools e.g., Mesos-DNS
- service load balancing tools e.g., marathon-LB
- container tools e.g., Docker or rkt
- log/metric tools e.g., collectd, fluentd, kibana, etc.
- big data processing systems e.g.
- data platform 12 may make use of a collection of microservices.
- Each microservice can have multiple instances, and may be configured to recover from failure, scale, and distribute work amongst various such instances, as applicable.
- microservices are auto-balancing for new instances, and can distribute workload if new instances are started or existing instances are terminated.
- microservices may be deployed as self-contained Docker containers.
- a Mesos- Marathon or Spark framework can be used to deploy the microservices (e.g., with Marathon monitoring and restarting failed instances of microservices as needed).
- the service etcd2 can be used by microservice instances to discover how many peer instances are running, and used for calculating a hash-based scheme for workload distribution.
- Microservices may be configured to publish various health/status metrics to either an SQS queue, or etcd2, as applicable.
- Amazon DynamoDB can be used for state management.
- Graph generator 146 is a microservice that may be responsible for generating raw behavior graphs on a per customer basis periodically (e.g., once an hour). In particular, graph generator 146 may generate graphs of entities (as the nodes in the graph) and activities between entities (as the edges). In various embodiments, graph generator 146 also performs other functions, such as aggregation, enrichment (e.g., geolocation and threat), reverse DNS resolution, TF-IDF based command line analysis for command type extraction, parent process tracking, etc. [00199] Graph generator 146 may perform joins on data collected by the agents, so that both sides of a behavior are linked.
- first and second virtual machines may each report information on their view of the communication (e.g., the PID of their respective processes, the amount of data exchanged and in which direction, etc ).
- the graph generator When graph generator performs a join on the data provided by both agents, the graph will include a node for each of the processes, and an edge indicating communication between them (as well as other information, such as the directionality of the communication - i.e., which process acted as the server and which as the client in the communication).
- connections are process to process (e.g., from a process on one virtual machine within the cloud environment associated with entity A to another process on a virtual machine within the cloud environment associated with entity A).
- a process may be in communication with a node (e.g., outside of entity A) which does not have an agent deployed upon it.
- a node within entity A might be in communication with node 172, outside of entity A.
- communications with node 172 are modeled (e.g., by graph generator 146) using the IP address of node 172.
- the IP address of the node can be used by graph generator in modeling.
- Graph generator 146 can be implemented in any appropriate programming language, such as Java or C, and machine learning libraries, such as Spark’s MLLib.
- Example ways that graph generator computations can be implemented include using SQL or Map-R, using Spark or Hadoop.
- SSH tracker 148 is a microservice that may be responsible for following ssh connections and process parent hierarchies to determine trails of user ssh activity. Identified ssh trails are placed by the SSH tracker 148 into data store 30 and cached for further processing.
- SSH tracker 148 can be implemented in any appropriate programming language, such as Java or C, and machine libraries, such as Spark’s MLLib.
- Example ways that SSH tracker computations can be implemented include using SQL or Map-R, using Spark or Hadoop.
- Threat aggregator 150 is a microservice that may be responsible for obtaining third party threat information from various applicable sources, and making it available to other microservices. Examples of such information include reverse DNS information, GeoIP information, lists of known bad domains/IP addresses, lists of known bad files etc. As applicable, the threat information is normalized before insertion into data store 30. Threat aggregator 150 can be implemented in any appropriate programming language, such as Java or C, using SQL/JDBC libraries to interact with data store 30 (e.g., for insertions and queries).
- Scheduler 152 is a microservice that may act as a scheduler and that may run arbitrary jobs organized as a directed graph. In some examples, scheduler 152 ensures that all jobs for all customers are able to run during at a given time interval (e.g., every hour). Scheduler 152 may handle errors and retrying for failed jobs, track dependencies, manage appropriate resource levels, and/or scale jobs as needed. Scheduler 152 can be implemented in any appropriate programming language, such as Java or C. A variety of components can also be used, such as open source scheduler frameworks (e.g., Airflow), or AWS services (e.g., the AWS Data pipeline) which can be used for managing schedules.
- open source scheduler frameworks e.g., Airflow
- AWS services e.g., the AWS Data pipeline
- Graph Behavior Modeler (GBM) 154 is a microservice that may compute polygraphs.
- GBM 154 can be used to find clusters of nodes in a graph that should be considered similar based on some set of their properties and relationships to other nodes. As described herein, the clusters and their relationships can be used to provide visibility into a datacenter environment without requiring user specified labels. GBM 154 may track such clusters over time persistently, allowing for changes to be detected and alerts to be generated.
- GBM 154 may take as input a raw graph (e.g., as generated by graph generator 146). Nodes are actors of a behavior, and edges are the behavior relationship itself. For example, in the case of communication, example actors include processes, which communicate with other processes.
- the GBM 154 clusters the raw graph based on behaviors of actors and produces a summary (the polygraph). The polygraph summarizes behavior at a datacenter level.
- the GBM also produces “observations” that represent changes detected in the datacenter. Such observations may be based on differences in cumulative behavior (e.g., the baseline) of the datacenter with its current behavior.
- the GBM 154 can be implemented in any appropriate programming language, such as Java, C, or Golang, using appropriate libraries (as applicable) to handle distributed graph computations (handling large amounts of data analysis in a short amount of time).
- Apache Spark is another example tool that can be used to compute polygraphs.
- the GBM can also take feedback from users and adjust the model according to that feedback. For example, if a given user is interested in relearning behavior for a particular entity, the GBM can be instructed to “forget” the implicated part of the polygraph.
- GBM runner 156 is a microservice that may be responsible for interfacing with GBM 154 and providing GBM 154 with raw graphs (e.g., using a query language, such as SQL, to push any computations it can to data store 30). GBM runner 156 may also insert polygraph output from GBM 154 to data store 30. GBM runner 156 can be implemented in any appropriate programming language, such as Java or C, using SQL/JDBC libraries to interact with data store 30 to insert and query data.
- a query language such as SQL
- Alert generator 158 is a microservice that may be responsible for generating alerts.
- Alert generator 158 may examine observations (e.g., produced by GBM 154) in aggregate, deduplicate them, and score them. Alerts may be generated for observations with a score exceeding a threshold.
- Alert generator 158 may also compute (or retrieves, as applicable) data that a customer (e.g., user A or user B) might need when reviewing the alert. Examples of events that can be detected by data platform 12 (and alerted on by alert generator 158) include, but are not limited to the following:
- new user This event may be created the first time a user (e.g., of node 116) is first observed by an agent within a datacenter.
- new external connection This event may be generated when a connection to an external IP/domain is made from a new application.
- new parent This event may be generated when an application is launched by a different parent.
- connection to known bad IP/domain Data platform 12 maintains (or can otherwise access) one or more reputation feeds. If an environment makes a connection to a known bad IP or domain, an event will be generated.
- An event may be generated when a successful connection to a datacenter from a known bad IP is observed by data platform 12.
- Alert generator 158 can be implemented in any appropriate programming language, such as Java or C, using SQL/JDBC libraries to interact with data store 30 to insert and query data. In various embodiments, alert generator 158 also uses one or more machine learning libraries, such as Spark’s MLLib (e.g., to compute scoring of various observations). Alert generator 158 can also take feedback from users about which kinds of events are of interest and which to suppress.
- QsJobServer 160 is a microservice that may look at all the data produced by data platform 12 for an hour, and compile a materialized view (MV) out of the data to make queries faster. The MV helps make sure that the queries customers most frequently run, and data that they search for, can be easily queried and answered.
- MV materialized view
- Reporting module 164 is a microservice that may be responsible for creating reports out of customer data (e.g., daily summaries of events, etc.) and providing those reports to customers (e.g., via email). Reporting module 164 can be implemented using any appropriate programming language, such as Java or C. Reporting module 164 can be configured to use an email service (e.g., AWS SES or pagerduty) to send emails. Reporting module 164 may also provide templating support (e g., Velocity or Moustache) to manage templates (e.g., for constructing HTML-based email).
- templating support e., Velocity or Moustache
- Web app 120 is a microservice that provides a user interface to data collected and processed on data platform 12.
- Web app 120 may provide login, authentication, query, data visualization, etc. features.
- Web app 120 may, in some embodiments, include both client and server elements.
- Example ways the server elements can be implemented are using Java DropWizard or Node.Js to serve business logic, and a combination of JSON/HTTP to manage the service.
- Example ways the client elements can be implemented are using frameworks such as React, Angular, or Backbone. JSON, j Query, and JavaScript libraries (e.g., underscore) can also be used.
- Cache 170 may be implemented by Redis and/or any other service that provides a keyvalue store.
- Data platform 12 can use cache 170 to keep information for frontend services about users. Examples of such information include valid tokens for a customer, valid cookies of customers, the last time a customer tried to login, etc.
- Fig. 3A illustrates an example of a process for detecting anomalies in a network environment.
- process 300 is performed by data platform 12.
- the process begins at 301 when data associated with activities occurring in a network environment (such as entity A’s datacenter) is received.
- data associated with activities occurring in a network environment such as entity A’s datacenter
- agent-collected data described above (e.g., in conjunction with process 200).
- the cumulative graph (also referred to herein as a cumulative PType graph and a polygraph) is a running model of how processes behave over time. Nodes in the cumulative graph are PType nodes, and provide information such as a list of all active processes and PIDs in the last hour, the number of historic total processes, the average number of active processes per hour, the application type of the process (e.g., the CType of the PType), and historic CType information/frequency.
- Edges in the cumulative graph can represent connectivity and provide information such as connectivity frequency.
- the edges can be weighted (e.g., based on number of connections, number of bytes exchanged, etc ).
- Edges in the cumulative graph (and snapshots) can also represent transitions. [00233]
- One approach to merging a snapshot of the activity of the last hour into a cumulative graph is as follows. An aggregate graph of physical connections is made for the connections included in the snapshot (as was previously done for the original snapshot used during bootstrap). And, clustering/splitting is similarly performed on the snapshot’s aggregate graph. Next, PType clusters in the snapshot’s graph are compared against PType clusters in the cumulative graph to identify commonality.
- any surviving nodes i.e., present in both the cumulative graph and the snapshot graph
- PTypes change is noted as a transition, and an alert can be generated.
- a surviving node changes PType and also changes CmdType, a severity of the alert can be increased.
- an aggregated physical graph can be generated on a per customer basis periodically (e.g., once an hour) from raw physical graph information, by matching connections (e g., between two processes on two virtual machines).
- a deterministic fixed approach is used to cluster nodes in the aggregated physical graph (e.g., representing processes and their communications).
- Matching Neighbors Clustering MNC can be performed on the aggregated physical graph to determine which entities exhibit identical behavior and cluster such entities together.
- Fig. 3B depicts a set of example processes (pl, p2, p3, and p4) communicating with other processes (plO and pl 1).
- Fig. 3B is a graphical representation of a small portion of an aggregated physical graph showing (for a given time period, such as an hour) which processes in a datacenter communicate with which other processes.
- processes pl, p2, and p3 will be clustered together (305), as they exhibit identical behavior (they communicate with plO and only plO).
- Process p4 which communicates with both plO and pl 1, will be clustered separately.
- MNC only those processes exhibiting identical (communication) behavior will be clustered.
- an alternate clustering approach can also/instead be used, which uses a similarity measure (e.g., constrained by a threshold value, such as a 60% similarity) to cluster items.
- a similarity measure e.g., constrained by a threshold value, such as a 60% similarity
- the output of MNC is used as input to SimRank, in other embodiments, MNC is omitted.
- the node input to the PTypeConn model for a given time period includes non-interactive (i.e., not associated with tty) process nodes that had connections in the time period and the base graph nodes of other types (IP Service Endpoint (IPSep) comprising an IP address and a port), DNS Service Endpoint (DNSSep) and IP Address) that have been involved in those connections.
- IP Service Endpoint IP Service Endpoint
- DNSSep DNS Service Endpoint
- IP Address IP Address
- the node input to this model for a given time period includes process nodes that are running in that period.
- the value class of process nodes is CType (similar to how it is created for the PtypeConn Model).
- the base relationship that is used for clustering is a parent process with a given CType launching a child process with another given destination CType.
- the physical input for this model includes process nodes (only) with the caveat that the complete ancestor hierarchy active process nodes (i.e., that are running) for a given time period is provided as input even if an ancestor is not active in that time period. Note that launch relationships are available from ppid_hash fields in the process node properties.
- An MTypeConn Model may cluster nodes of the same class that have similar connectivity relationships. For example, if two machines had similar incoming neighbors of the same class and outgoing neighbors of the same class, they could be clustered.
- the machine_terms property in the Machine nodes is used, in various embodiments, for labeling machines that are clustered together. If a majority of the machines clustered together share a term in the machine_terms, that term can be used for labeling the cluster.
- the class value for IPSep nodes is determined as follows:
- the class value for IpAddress nodes is determined as follows:
- GBM 154 generates multiple events of this type for the same class value.
- the set size indicates the size of the cluster referenced in the keys field.
- NewClass events can be generated if there is more than one cluster in that class. NewNode events will not be generated separately in this case.
- the key field contains source and destination class values and also source and destination cluster identifiers (i.e., the src/dst_node:key.cid represents the src/dst cluster identifier).
- an event of this type could involve multiple edges between different cluster pairs that have the same source and destination class values.
- GBM 154 can generate multiple events in this case with different source and destination cluster identifiers.
- extended user session tracking It may be desirable to know which operations are performed (as the superuser) by which original user when debugging issues.
- ssh secure shell
- extended user session tracking is not limited to the ssh protocol or a particular limitation and the techniques described herein can be extended to other login mechanisms.
- the source fields correspond to the side from which the connection was initiated.
- the agent associates an ssh connection with the privileged ssh process that is created for that connection.
- B3 is a descendant of Bl and is thus associated with LS2.
- Fig. 3J illustrates an example of a process for performing extended user tracking.
- process 380 is performed periodically (e g., once an hour in a batch fashion) by ssh tracker 148 to generate new output data.
- batch processing allows for efficient analysis of large volumes of data.
- the approach can be adapted, as applicable, to process input data on a record-by-record fashion while maintaining the same logical data processing flow.
- the results of a given portion of process 380 are stored for use in a subsequent portion.
- the query uses a range filter on the login_time column where the range is from start_time_of_current_period - lookback_time to start time of current period. (No records as a result of performing 385 on the scenario depicted in Fig. 31 are obtained, as only a single time period is applicable in the example scenario.)
- the recursive approach can be considered to include multiple sub-steps where new processes that are identified to be ssh local descendants in the current sub-step are considered as ancestors for the next step.
- new processes that are identified to be ssh local descendants in the current sub-step are considered as ancestors for the next step.
- descendancy relationships will be established in two sub-steps:
- Process B3 is a local descendant of LS2 because it is a child of process Bl which is associated to LS2 in sub-step 1.
- Implementation portion 387 can use a datastore that supports recursive query capabilities, or, queries can be constructed to process multiple conceptual sub-steps at once. In the example depicted in Fig. 31, the processing performed at 387 will generate the records depicted in Fig. 3P. Note that the ssh privileged processes associated with the logins are also included as they are part of the login session.
- the lineage of new ssh logins created in the current time period is determined by associating their ssh connections to source processes that may be descendants of other ssh logins (which may have been created in the current period or previous time periods). In order to do so, first an attempt is made to join the new ssh login connections in the current period (identified at
- output data is generated.
- the new login-connection, login- local-descendant, and login-lineage records generated at 384, 387, and 388 are inserted into their respective output tables (e.g., in a transaction manner).
- connection GUID e.g., the SYN packet
- server e.g., the server
- GUID the GUID
- the agent can configure the network stack (e.g., using IP tables functionality on Linux) to intercept an outgoing TCP SYN packet and modify it to add the generated GUID as a TCP option.
- Joe can relinquish his root privileges by closing out of any additional shells created, reverting back to a shell created for user joe. smith.
- a user In a typical day in a datacenter, a user (e.g., Joe Smith) will log in, run various processes, and (optionally) log out. The user will typically log in from the same set of IP addresses, from IP addresses within the same geographical area (e.g., city or country), or from historically known IP addresses/geographical areas (i.e., ones the user has previously/occasionally used). A deviation from the user’s typical (or historical) behavior indicates a change in login behavior. However, it does not necessarily mean that a breach has occurred.
- a user may take a variety of actions. As a first example, a user might execute a binary/script.
- the above information associated with user behavior is broken into four tiers.
- the tiers represent example types of information that data platform 12 can use in modeling user behavior:
- the user s entry point (e.g., domains, IP addresses, and/or geolocation information such as country/city) from which a user logs in.
- entry point e.g., domains, IP addresses, and/or geolocation information such as country/city
- the user executes a script (“collect_data.sh”) on Machine03.
- the script internally communicates (as root) to a MySQL-based service internal to the datacenter, and downloads data from the MySQL-based service.
- the user externally communicates with a server outside the datacenter (“ExternalOl”), using a POST command.
- the source/entry point is IP01.
- Data is transferred to an external server ExternalOl.
- the machine performing the transfer to ExternalOl is Machine03.
- the user transferring the data is “root” (on Machine03), while the actual user (hiding behind root) is UserA.
- the “original user” (ultimately responsible for transmitting data to ExternalOl) is UserA, who logged in from IP01.
- UserA the “original user” (ultimately responsible for transmitting data to ExternalOl) is UserA, who logged in from IP01.
- Each of the processes ultimately started by UserA whether started at the command line (tty) such as “runnable, sh” or started after an ssh connection such as “new_runnable.sh,” and whether as UserA, or as a subsequent identity, are all examples of child processes which can be arranged into a process hierarchy.
- machines can be clustered together logically into machine clusters.
- One approach to clustering is to classify machines based on information such as the types of services they provide/binaries they have installed upon them/processes they execute. Machines sharing a given machine class (as they share common binaries/services/etc.) will behave similarly to one another.
- Each machine in a datacenter can be assigned to a machine cluster, and each machine cluster can be assigned an identifier (also referred to herein as a machine class).
- One or more tags can also be assigned to a given machine class (e.g., database_servers_west or prod_web_frontend).
- One approach to assigning a tag to a machine class is to apply term frequency analysis (e.g., TF/IDF) to the applications run by a given machine class, selecting as tags those most unique to the class.
- Data platform 12 can use behavioral baselines taken for a class of machines to identify deviations from the baseline (e.g., by a particular machine in the class).
- Fig. 4A illustrates a representation of an embodiment of an insider behavior graph.
- each node in the graph can be: (1) a cluster of users (2) a cluster of launched processes (3) a cluster of processes/servers running on a machine class (4) a cluster of external IP addresses (of incoming clients) or (5) a cluster of external servers based on DNS/IP/etc.
- graph data is vertically tiered into four tiers.
- Tier 0 (400) corresponds to entry point information (e.g., domains, IP addresses, and/or geolocation information) associated with a client entering the datacenter from an external entry point. Entry points are clustered together based on such information.
- Tier 1 corresponds to a user on a machine class, with a given user on a given machine class represented as a node.
- Tier 2 corresponds to launched processes, child processes, and/or interactive processes. Processes for a given user and having similar connectivity (e.g., sharing the processes they launch and the machines with which they communicate) are grouped into nodes.
- Tier 3 corresponds to the services/servers/domains/IP addresses with which processes communicate.
- Tier 0 nodes log in to tier 1 nodes.
- Tier 1 nodes launch tier 2 nodes.
- Tier 2 nodes connect to tier 3 nodes.
- Tier 1 corresponds to a user (e.g., user “U”) logging into a machine having a particular machine class (e.g., machine class “M”).
- Tier 2 is a cluster of processes having command line similarity (e.g., CType “C”), having an original user “U,” and running as a particular effective user (e.g., user “Ul”).
- the effective user in the Tier 2 node may or may not match the original user (while the original user in the Tier 2 node will match the original user in the Tier 1 node).
- a change from a user U into a user Ul can take place in a variety of ways. Examples include where U becomes Ul on the same machine (e.g., via su), and also where U sshes to other machine(s). In both situations, U can perform multiple changes, and can combine approaches. For example, U can become Ul on a first machine, ssh to a second machine (as Ul), become U2 on the second machine, and ssh to a third machine (whether as user U2 or user U3).
- the complexity of how user U ultimately becomes U3 (or U5, etc.) is hidden from a viewer of an insider behavior graph, and only an original user (e.g., U) and the effective user of a given node (e.g., U5) are depicted.
- additional detail about the path e.g., an end-to-end path of edges from user U to user U5 can be surfaced (e g., via user interactions with nodes).
- Fig. 4B illustrates an example of a portion of an insider behavior graph (e.g., as rendered in a web browser).
- node 405 (the external IP address, 52.32.40.231) is an example of a Tier 0 node, and represents an entry point into a datacenter.
- two users “aruneli_prod” and “harishjjrod,” both made use of the source IP 52.32.40.231 when logging in between 5pm and 6pm on Sunday July 30 (408).
- Nodes 409 and 410 are examples of Tier 1 nodes, having aruneli_prod and harish_prod as associated respective original users.
- a user communicates with an external server which has a geolocation not previously used by that user.
- Such changes can be surfaced as alerts, e.g., to help an administrator determine when/what anomalous behavior occurs within a datacenter.
- the behavior graph model can be used (e.g., during forensic analysis) to answer questions helpful during an investigation. Examples of such questions include:
- FIG. 4C depicts a baseline of behavior for a user, “Bill.”
- Bill typically logs into a datacenter from the IP address, 71.198.44.40 (427). He typically makes use of ssh (428), and sudo (429), makes use of a set of typical applications (430) and connects (as root) with the external service, api.lacework.net (431).
- FIG. 4D depicts an embodiment of how the graph depicted in Fig. 4C would appear once Eve begins exfiltrating data from the datacenter.
- Eve logs into the datacenter (using Bill’s credentials) from 52.5.66.8 (432).
- Alex e.g., via su alex
- Alex e.g., via su alex
- Alex Eve executes a script, “sneak.sh” (433), which launches another script, “post.sh” (434), which contacts external server 435 which has an IP address of 52.5.66.7, and transmits data to it.
- a benign edge suppose Bill begins working from a home office two days a week. The first time he logs in from his home office (i.e., from an IP address that is not 71.198.44.40), an alert can be generated that he has logged in from a new location. Over time, however, as Bill continues to log in from his home office but otherwise engages in typical activities, Bill’s graph will evolve to include logins from both 71.198.44.40 and his home office as baseline behavior. Similarly, if Bill begins using a new tool in his job, an alert can be generated the first time he executes the tool, but over time will become part of his baseline. [00521] In some cases, a single edge can indicate a serious threat.
- Examples of alerts that can be generated using the user login graph include:
- FIG. 4F is a representation of an example of an interface that depicts such an alert. Specifically, as indicated in region 447, an alert for the time period 1pm- 2pm on June 8 was generated. The alert identifies that a new user, Bill (448) executed a process.
- a JSON file can be used to store multiple cards (e.g., as part of a query service catalog).
- a particular card is represented by a single JSON object with a unique name as a field name.
- Each card may be described by the following named fields:
- SOURCES a JSON array object explicitly specifying references of input entities. Each source reference has the following attributes:
- RETURNS a required JSON array object of a return field object.
- a return field object can be described by the following attributes:
- PROPS generic JSON objects for other entity properties
- JOINS a JSON array of join operators. Possible fields for a join operator include: [00591] * type (possible join types include: “loj” - Left Outer Join, “join” - Inner Join, “in” - Semi Join, “implicit” - Implicit Join)
- FKEYS a JSON array of FilterKey(s).
- the fields for a FilterKey are:
- FILTERS a JSON array of filters (conjunct). Possible fields for a filter include:
- ORDERS a JSON array of ORDER BY for returning fields. Possible attributes for the ORDER BY clause include:
- GROUPS a JSON array of GROUP BY for returning fields.
- Field attributes are: [00610] * field (ordinal index (1 based) or alias from the return fields) [00611]
- LIMIT a limit for the number of records to be returned
- a second extensibility scenario is one in which a FilterKey in the filter data set is extended (i.e., existing template functions are used to define a new data set).
- data sets used by data platform 12 are composable/reusable/extensible, irrespective of whether the data sets are relational or graph data sets.
- One example data set is the User Tracking polygraph, which is generated as a graph data set (comprising nodes and edges). Like other polygraphs, User Tracking is an external data set that can be visualized both as a graph (via the nodes and edges) and can also be used as a filter data set for other cards, via the cluster identifier (CID) field.
- CID cluster identifier
- Dynamic composition of filter datasets can be implemented using FilterKeys and FilterKey Types.
- a FilterKey can be defined as a list of columns and/or fields in a nested structure (e.g., JSON). Instances of the same FilterKey Type can be formed as an Implicit Join Group. The same instance of a FilterKey can participate in different Implicit Join Groups.
- a list of relationships among all possible Implicit Join Groups is represented as a Join graph for the entire search space to create a final data filter set by traversing edges and producing Join Path(s).
- query service 166 parses the list of implicit joins and creates a Join graph to manifest relationships of FilterKeys among Entities.
- a Join graph (an example of which is depicted in Fig. 4K) comprises a list of Join Link(s).
- a Join Link represents each implicit join group by the same FilterKey type.
- a Join Link maintains a reverse map (Entity -to-FilterKey) of FilterKeys and their Entities. As previously mentioned, Entities can have more than one FilterKey defined. The reverse map guarantees one FilterKey per Entity can be used for each JoinLink.
- Each JoinLink also maintains a list of entities for the priority order of joins.
- Fig. 4L illustrates an example of a process for dynamically generating and executing a query.
- process 485 is performed by data platform 12.
- the process begins at 486 when a request is received to filter information associated with activities within a network environment.
- a request is received to filter information associated with activities within a network environment.
- One example of such a request occurs in response to user A clicking on tab 465.
- Another example of such a request occurs in response to user A clicking on link 464-1.
- Yet another example of such a request occurs in response to user A clicking on link 464-2 and selecting (e g., from a dropdown) an option to filter (e.g., include, exclude) based on specific criteria that she provides (e.g., an IP address, a username, a range of criteria, etc ).
- An implicit join filter (EntityFilter) is implemented as a subclass of Entity, similar to the right hand side of a SQL semi-join operator. Unlike the static SQL semi-join construct, it is conditionally and recursively generated even if it is specified in the input sources of the JSON specification.
- Another recursive interface can also be used in conjunction with process 485, preSQLGen, which is primarily the entry point for EntityFilter to run a search and generate nested implicit join filters.
- preSQLGen is primarily the entry point for EntityFilter to run a search and generate nested implicit join filters.
- sqlWhere Another recursive interface, sqlWhere, can be used to generate conjuncts and disjuncts of local predicates and semijoin predicates based on implicit join transformations.
- sqlProject, sqlGroupBy, sqlOrderBy, and sqlLimitOffset can respectively be used to translate the corresponding directives in JSON spec to SQL SELECT list, GROUP BY, ORDER BY, and LIMIT/OFFSET clauses.
- data that is collected from agents and other sources may be stored in different ways.
- data that is collected from agents and other sources may be stored in a data warehouse, data lake, data mart, and/or any other data store.
- Data lakes which store files of data in their native format, may be considered as “schema on read” resources.
- any application that reads data from the lake may impose its own types and relationships on the data.
- Data warehouses are “schema on write,” meaning that data types, indexes, and relationships are imposed on the data as it is stored in the EDW.
- “Schema on read” resources may be beneficial for data that may be used in several contexts and poses little risk of losing data.
- “Schema on write” resources may be beneficial for data that has a specific purpose, and good for data that must relate properly to data from other sources.
- Such data stores may include data that is encrypted using homomorphic encryption, data encrypted using privacy-preserving encryption, smart contracts, non-fungible tokens, decentralized finance, and other techniques.
- laC approaches may be enabled in a variety of ways including, for example, using IaC software tools such as Terraform by HashiCorp. Through the usage of such tools, users may define and provide data center infrastructure using JavaScript Object Notation (‘ JSON’), YAML, proprietary formats, or some other format.
- JSON JavaScript Object Notation
- YAML YAML
- proprietary formats or some other format.
- the configuration files may be used to emulate a cloud deployment for the purposes of analyzing the emulated cloud deployment using the systems described herein.
- the configuration files themselves may be used as inputs to the systems described herein, such that the configuration files may be inspected to identify vulnerabilities, misconfigurations, violations of regulatory requirements, or other issues.
- configuration files for multiple cloud deployments may even be used by the systems described herein to identify best practices, to identify configuration files that deviate from typical configuration files, to identify configuration files with similarities to deployments that have been determined to be deficient in some way, or the configuration files may be leveraged in some other ways to detect vulnerabilities, misconfigurations, violations of regulatory requirements, or other issues prior to deploying an infrastructure that is described in the configuration files.
- the techniques described herein may be use in multi-cloud, multi-tenant, cross-cloud, cross-tenant, cross-user, industry cloud, digital platform, and other scenarios depending on specific need or situation.
- the deployments that are analyzed, monitored, evaluated, or otherwise observed by the systems described herein may be monitored to determine the extent to which a particular component has experienced “drift” relative to its associated IaC configuration.
- Discrepancies between how cloud resources were defined in an IaC configuration file and how they are currently configured in runtime may be identified and remediation workflows may be initiated to generate an alert, reconfigure the deployment, or take some other action. Such discrepancies may occur for a variety of reasons.
- the deployments e.g., a customer’s cloud deployment
- the systems described herein e.g., systems that include components such as the platform 12 of Fig. ID, the data collection agents described herein, and/or other components
- SaC security as code
- a zero trust security model (a.k.a., zero trust architecture) describes an approach to the design and implementation of IT systems.
- a primary concept behind zero trust is that devices should not be trusted by default, even if they are connected to a managed corporate network such as the corporate LAN and even if they were previously verified.
- Zero trust security models help prevent successful breaches by eliminating the concept of trust from an organization's network architecture.
- Zero trust security models can include multiple forms of authentication and authorization (e.g., machine authentication and authorization, human/user authentication and authorization) and can also be used to control multiple types of accesses or interactions (e.g., machine-to-machine access, human-to-machine access).
- systems described herein may be configured to collect security event logs (or any other type of log or similar record of activity) and telemetry in real time to provide customers with a SIEM or SIEM-like solution.
- SIEM technology aggregates event data produced by security devices, network infrastructure, systems, applications, or other source. Centralizing all of the data that may be generated by a cloud deployment may be challenging for a traditional SIEM, however, as each component in a cloud deployment may generate log data or other forms of machine data, such that the collective amount of data that can be used to monitor the cloud deployment can grow to be quite large.
- a traditional SIEM architecture where data is centralized and aggregated, can quickly result in large amounts of data that may be expensive to store, process, retain, and so on.
- VDI virtual desktop infrastructure
- the systems described herein may be used to manage, analyze, or otherwise observe deployments that include other forms of AI/ML tools.
- the systems described herein may manage, analyze, or otherwise observe deployments that include Al services.
- Al services are, like other resources in an as-a-service model, ready-made models and Al applications that are consumable as services and made available through APIs.
- organizations may access pre-trained models that accomplish specific tasks. Whether an organization needs natural language processing (‘NLP’), automatic speech recognition (‘ASR’), image recognition, or some other capability, Al services simply plug-and-play into an application through an API.
- NLP natural language processing
- ASR automatic speech recognition
- image recognition or some other capability
- the systems described herein may be used to manage, analyze, or otherwise observe deployments that include distributed training engines or similar mechanisms such as, for example, such as tools built on Dask.
- Dask is an open source library for parallel computing that is written in Python.
- Dask is designed to enable data scientists to improve model accuracy faster, as Dask enables data scientists can do everything in Python end-to-end, which means that they no longer need to convert their code to execute in environments like Apache Spark. The result is reduced complexity and increased efficiency.
- the systems described herein may also be used to manage, analyze, or otherwise observe deployments that include technologies such as RAPIDS (an open source Python framework which is built on top of Dask).
- RAPIDS an open source Python framework which is built on top of Dask.
- RAPIDS optimizes compute time and speed by providing data pipelines and executing data science code entirely on graphics processing units (GPUs) rather than CPUs.
- GPUs graphics processing units
- Multi-cluster, shared data architecture, DataFrames, Java user-defined functions (UDF) are supported to enable trained models to run within a data warehouse.
- ransomware attacks are often deployed as part of a larger attack that may involve, for example:
- embodiments of the present disclosure may be configured as follows:
- the systems may include a cloud-based, SaaS-style, multitenant infrastructure.
- the systems described herein may manage, analyze, or otherwise observe deployments that include various authentication technologies, such as multi-factor authentication and role-based authentication.
- the authentication technologies may be included in the set of resources that are managed, analyzed, or otherwise observed as interactions with the authentication technologies may monitored.
- log files or other information retained by the authentication technologies may be gathered by one or more agents and used as input to the systems described herein.
- supply chain attacks may involve behavior that deviates from normal behavior of a cloud deployment that is not experiencing a supply chain attack, such that the mere presence of unusual activity may trigger the systems described herein to generate alerts or take some other action, even without explicit knowledge that the unusual activity is associated with a supply chain attack.
- agents, sensors, or similar mechanisms may be deployed on or near managed endpoints such as computers, servers, virtualized hardware, internet of things (‘lotT’) devices, mobile devices, phones, tablets, watches, other personal digital devices, storage devices, thumb drives, secure data storage cards, or some other entity.
- endpoint protection platform may provide functionality such as:
- detecting ‘unusual activity’ may alternatively be viewed as detecting a deviation from ‘normal activity’ such that ‘unusual activity’ does not need to be identified and sought out. Instead, deviations from ‘normal activity’ may be assumed to be ‘unusual activity’.
- data may be collected from one or more agents
- other methods and mechanisms for obtaining data may be utilized.
- some embodiments may utilize agentless deployments where no agent (or similar mechanism) is deployed on one or more customer devices, deployed within a customer’s cloud deployment, or deployed at another location that is external to the data platform.
- the data platform may acquire data through one or more APIs such as the APIs that are available through various cloud services.
- one or more APIs that enable a user to access data captured by Amazon CloudTrail may be utilized by the data platform to obtain data from a customer’s cloud deployment without the use of an agent that is deployed on the customer’s resources.
- agents may be deployed as part of a data acquisition service or tool that does not utilize a customer’s resources or environment.
- agents deployed on a customer’s resources or elsewhere
- mechanisms in the data platform that can be used to obtain data from through one or more APIs such as the APIs that are available through various cloud services may be utilized.
- one or more cloud services themselves may be configured to push data to some entity (deployed anywhere), which may or may not be an agent.
- other data acquisition techniques may be utilized, including combinations and variations of the techniques described above, each of which is within the scope of the present disclosure.
- additional examples can include multi-cloud deployments, on-premises environments, hybrid cloud environments, sovereign cloud environments, heterogeneous environments, DevOps environments, DevSecOps environments, GitOps environments, quantum computing environments, data fabrics, composable applications, composable networks, decentralized applications, and many others.
- Other types of data can include, for example, data collected from different tools (e g., DevOps tools, DevSecOps, GitOps tools), different forms of network data (e.g., routing data, network translation data, message payload data, Wi-Fi data, Bluetooth data, personal area networking data, payment device data, near field communication data, metadata describing interactions carried out over a network, and many others), data describing processes executing in a container, lambda, EC2 instance, virtual machine, or other execution environment), information describing the execution environment itself, and many other types of data.
- tools e g., DevOps tools, DevSecOps, GitOps tools
- different forms of network data e.g., routing data, network translation data, message payload data, Wi-Fi data, Bluetooth data, personal area networking data, payment device data, near field communication data, metadata describing interactions carried out over a network, and many others
- data describing processes executing in a container lambda
- Fig. 5A sets forth a system for providing many of the features described herein for user devices as a distributed edge service in accordance with some embodiments of the present disclosure.
- the system depicted in Fig. 5A includes a distributed edge platform 510.
- the distributed edge platform 510 may be similar to the systems described above where the distributed edge platform 510 can be used to perform tasks such as, for example, anomaly detection, threat detection, vulnerability detection, compliance monitoring, and many others.
- the distributed edge platform 510 may be deployed in a distributed fashion, such that instances of the distributed edge platform 510 are deployed on geographically distributed execution environments, as will be described in greater detail below.
- the distributed edge platform 510 depicted in Fig. 5A may be utilized to provide for continuous risk behavior based security to user devices.
- Such security is ‘continuous’ in the sense that data regarding the activity of a user device is continuously gathered and evaluated in realtime (or near real-time) rather than performing batch-based evaluation.
- Such security is ‘behavior based’ in the sense that various behaviors of the user device are used as the primary inputs into security evaluations. In many cases, such behaviors may not be concerning on their own, but instead may represent a deviation from typical device activity that warrants additional investigation. For example, if the user device is connecting to a cloud deployment, creating EC2 instances, and executing software on those EC2 instances, those steps alone may not be concerning.
- the private environment 502 is ‘private’ in the sense that it is not available for public consumption.
- the private environment 502 depicted in Fig. 5A is supporting a plurality of resources 504a-n that may include, for example, software applications, databases, file systems, various services (e g., whether local SaaS offerings), development tools (e.g., automation servers, code repositories, etc%), and so on.
- access between the private environment 502 and the distributed edge platform 510 may involve the zero trust 508 authentication service, as any participants in data communications between the private environment 502, the distributed edge platform 510, and the user devices 512, 514, 522 that are monitored by the distributed edge platform 510 must be authenticated by the zero trust 508 authentication service.
- the zero trust 508 authentication service may be embodied, for example, as a third party authentication service such as Okta, or in some other way.
- the distributed edge platform 510 may be integrated with (and may leverage) a variety of third party tools such as identity and access management tools, Mobile device management (‘MDM’) tools, and so on.
- MDM Mobile device management
- a user device 512 can access the resources 504a-n in the private environment 502 via the zero trust 508 authentication service.
- the user device 512 need not connect to a VPN, go through a firewall, or use a similar mechanism in order to securely access the resources 504a-n in the private environment 502.
- the distributed edge platform 510 may implement policies that identify which users may access which resources 504a-n, what privileges each user has, and other policies so that the user device 512 is only routed to (and given access to) a particular resource 504a-n in the private environment 502 if the policies allow for such access.
- the distributed edge platform 510 can do all sorts of behavior analysis (i.e., analysis of device activity and user activity as described below) to add in anomaly detection capabilities, threat assessment, risk assessment, and other security related capabilities as described elsewhere in the present disclosure.
- the example depicted in Fig. 5A also includes a SaaS environment 518 that is supporting a plurality of resources 520a-n.
- the SaaS environment 518 may be embodied, for example, as a public cloud that is accessible using a particular account, as an environment provided by the vendor of some SaaS offering, or in some other way.
- the SaaS environment 518 may include a collection of hardware resources, software resources, networking resources, and other resources so that the SaaS environment 518 can be used to provide a vendor of software that is consumed as-a-service to offer their SaaS products.
- the SaaS offerings can include any software offered as-a-service including, for example, Salesforce, Office365, and many others.
- the resources 520a-n that may include software applications, databases, fde systems, or anything else offered as-a-service.
- a user device 522 can access the resources 526a-n in the public internet 528 via a network security 524 module.
- the network security 524 module may be embodied, for example, as one or more modules of computer program instructions that are configured to perform tasks such as SSL inspection, DNS inspection, and other tasks described in the present disclosure.
- the network security module may be configured to protect the user device 522 from malware intrusions, computer viruses, or other threats that can originate from the public internet 528.
- the distributed edge platform 510 may implement policies that identify which users may access which resources 520a-n, what privileges each user has, and other policies so that the user device 514 is only able to access various resources if the policies allow for such access.
- the distributed edge platform 510 can do all sorts of behavior analysis (i.e., analysis of device activity and user activity as described below) to add in anomaly detection capabilities, threat assessment, risk assessment, and other security related capabilities as described elsewhere in the present disclosure.
- the distributed edge platform 510 depicted in Fig. 5A includes a real-time visibility and policy enforcement 530 module.
- the real-time visibility and policy enforcement 530 module may be embodied, for example, as one or more modules of computer program instructions executing on computer hardware, virtualized hardware, or in some other execution environment.
- the real-time visibility and policy enforcement 530 module may be configured to carry out many of the steps described above such as, for example, monitoring user activity (e.g., via data communications involving a user device or in some other way) to enforce various policies describing how the user devices may be utilized, what resources the user devices may access, what privileges the user device has, and so on.
- the distributed edge platform 510 depicted in Fig. 5A also includes an events, workflows, and auto management 534 module.
- the events, workflows, and auto management 534 module may be embodied, for example, as one or more modules of computer program instructions executing on computer hardware, virtualized hardware, or in some other execution environment.
- the events, workflows, and auto management 534 module may be configured to generate alerts, initiate a remediation workflow (or some other workflow), or perform other automatic remediation tasks as described in greater detail elsewhere in the present disclosure.
- Fig. 5B sets forth a system for providing many of the features described herein for user devices as a distributed edge service in accordance with some embodiments of the present disclosure.
- the various instances of the distributed edge platform 510a, 510b, 510c, 5 lOd may be updated in a coordinated fashion and may share access to the same information such that each instance operates in the same manner as any other instance, even if two instances are deployed in different ways (e.g., on different underlying resources).
- the example method depicted in Fig. 6 includes identifying 602 a location of a device that is associated with a user.
- the device that is associated with a user may be embodied, for example, as a smartphone, as a tablet computer, as a laptop computer, or as some other device.
- the device may be associated with the user because the user is logged into the device, the device has been designated for use by the user (e.g., the device is a company issued laptop provided by the user’s employer), or the device is otherwise associated with the user.
- the location of the device may be embodied, for example, as a city and state (e.g., Los Angeles, CA), as a label of a known location (e.g., ‘home’), or designated in some other way.
- the source IP address of a data communications packet generated by the device may be used to determine the location of the device.
- the locations of wireless networks or wireless access points that the device can connect to and the location of the source IP address of a data communications packet generated by the device match each other, the matching locations may be identified 602 as being the location of a device.
- the mechanisms for determining a device’s location will be explained in greater detail below.
- the example method depicted in Fig. 6 also includes determining 604 device activity associated with the user.
- the device activity associated with the user can include information describing the usage of the device.
- the device activity can include, for example, information describing applications that are being executed or utilized on the device, information describing how those applications are being utilized, information describing files being accessed via the device, information describing data sources being accessed via the device, information describing data communications coming into and flowing out of the device, and many others.
- the device activity can be used to ascertain how a device is being used, when the device is being used, where the device is being used, or any other quantifiable aspect of device usage.
- the example method depicted in Fig. 6 also includes determining 606, based on a profile associated with the user, that the device activity associated with the user deviates from normal activity for the user.
- the profile associated with the user may include information describing normal activity for the user.
- the normal activity for the user may be determined, for example, based on historical usage of the device (or some other device) associated with the user. That is, normal activity may be learned through an analysis of how the device has historically been used rather than being specified exclusively as a set of rules.
- comparisons the device activity and the profile associated with the user may utilize ranges, thresholds, or similar concepts to allow for minor deviations between the device activity and the profile associated with the user. For example, if the profile associated with the user indicates that the device is typically located at the user’s office between the hours of 9 AM and 5 PM on weekdays, but the device activity reveals that the user is still at the office at 6 PM on a particular Tuesday evening, this minor deviation may be tolerated and may not rise to the level of triggering an alarm. In contrast, if the device activity reveals that the user is at the office at 2:30 AM on a Sunday morning, this may be viewed as being a larger deviation.
- determining 604 device activity associated with the user can also include identifying 706 application behavior for the one or more applications.
- Identifying 706 application behavior for the one or more applications can include, for example, identifying the data communications endpoints that the applications have communicated with, identifying data that has been accessed by the applications, identifying users or accounts that have utilized the applications, identifying the time that the application was used, and so on.
- Such application behavior can include any quantifiable aspect describing how the application was used, when the application was used, by whom the application was used, and so on.
- the device activity associated with the user may be continuously monitored for deviation from normal activity.
- the device activity may be continuously monitored for deviation from normal activity, for example, by determining whether device activity associated with the user deviates from normal activity for the user (as specified in a user profile) every time that a change to the device activity associated with the user occurs, as part of a process that is always executing on some computing resources, or in some other way.
- the systems described herein may be provide real-time or near real-time detection of a deviation from normal activity - rather than batch processing or other form of delayed processing of device activity to determine whether such activity deviates from normal activity.
- Fig. 9 sets forth a flow chart illustrating an example method of establishing a location profile for a user device in accordance with some embodiments of the present disclosure.
- the example method depicted in Fig. 9 includes gathering 902 information associated with the location of a user device. Gathering 902 information associated with the location of a user device may be carried out not only by gathering geolocation information, but also gathering information that may be useful in determining the nature of a particular location. For example, information may be gathered 902 to determine whether the location of the user device is a location where the user device is frequently utilized, information may be gathered 902 to determine whether the location of the user device is a location where utilizing the user device to perform certain functions is allowed or prohibited, and so on.
- the information associated with the location of a user device may be gathered 902 not just for the purposes of determining where the device would be located on a map, but to use the location of the device as an input to determining whether a user or a user device is exhibiting anomalous behavior.
- examining data communications traffic sent by the user device and extracting the client IP address that is associated with the data communications traffic may be used in gathering 902 information associated with the location of the user device.
- the client IP address may be used to query one or more of a variety of services that convert IP addresses to geolocation information (e.g., a city/state, latitude and longitude, and so on).
- geolocation information e.g., a city/state, latitude and longitude, and so on.
- other embodiments could leverage image sensors or other capabilities of the user device to gather 902 information that may be informative for the purposes of identify a user device’s location.
- the information associated with the location of a user device may be gathered 902 for reasons that extend well beyond the context of a geolocation, the information associated with the location of a user device may be information that is more useful in describing the nature of a location rather than an absolute physical location.
- the particular set of device types that the user device can communicate with may be indicative of a location.
- the user device can detect the presence of a thermostat, a smart refrigerator, multiple smart TVs, a router, and the entertainment system of an automobile via its Wi-Fi adapter or Bluetooth adapter.
- this combination of reachable device types may be taken as an indication that the user device is at a private residence.
- this combination of reachable devices may be taken as an indication that the user device is at a public location (perhaps a coffee shop).
- information may be gathered 902 that can be used to determine a relative location, a type of location, characteristics of a location, or some other information that may be associated with a particular location.
- a user device may be expected to be used in one way when the user device is located at the user’s home, the user device may be expected to be used in another way when the user device is located at the user’s office, the user device may be expected to be used in yet another way when the user device is located in the user’s automobile, and so on.
- a location may only be determined 904 to be a ‘known location’ if the location has an associated set of user behaviors that would be expected to be observed when the user device is located at the ‘known location’.
- a location is not necessarily ‘known’ in the sense that the device’s geolocation is known, although in some embodiments a known location may include a specific geolocation (e.g., 123 Avenue A, San Francisco, CA). For example, while it may not be possible to determine the mailing address or street address where the user device is located, the detected presence of a particular set of other devices may be sufficient to determine that the user device is at a ‘home’ location, even if the exact street address of the home cannot be determined.
- a specific geolocation e.g., 123 Avenue A, San Francisco, CA.
- detecting that the user device is located at a particular set of GPS coordinates may be insufficient for determining that the location of the user device is ‘known’ if the set of GPS coordinates has no known relationship to a location where the user device has previously been used or is otherwise associated with a set of behaviors that would be expected to be observed when the user device is at the GPS coordinates.
- the location of the user device may only be determined to be a ‘known’ location if the user device has previously been accessed at the location or at some other location where utilization of the user device would be expected to be similar.
- this location may be determined to be a ‘known’ location by virtue of the fact that the user device has been used in the past at other hotels (even if the user device has not been previously used at the exact hotel that it is now located at).
- the example method depicted in Fig. 9 also includes, responsive to affirmatively 906 determining that the user device is being accessed at a known location, determining 908 a characterization of the known location.
- a ‘characterization’ of the known location may be embodied as description of the location that can be associated with some set of behaviors that would be expected from the user device or the user by virtue of the user device being located at a location that is characterized in a certain way.
- the characterization is a ‘home’ location.
- a set of expected behaviors may be associated with the user or the user device by virtue of the user device being at home.
- a cluster of locations at which the user device was utilized may be identified as locations within the user’s office complex, such that the user being on the northeast side of the office building and the user being on the southwest side of the office building are not treated as distinct locations, but are instead treated as the user being within the same density area that represents a single logical location.
- a cluster of locations that represent a commonly traversed path may be treated as a single entity.
- the trained model may learn a user’s daily route to work and may treat all of the individual locations along that route as a single entity.
- particular locations may be labelled (e.g., home, office, etc. . .) as part of the training process or after the model has been trained.
- FIG. 14 sets forth a flow chart illustrating an example method of sets forth a flow chart illustrating an example method of detecting deviation from normal behavior of a user device in accordance with some embodiments of the present disclosure.
- the example method depicted in Fig. 14 is similar to the example method depicted in Figs. 12 and 13, as Fig. 14 also includes generating 1202 a trained model for detecting normal activity for the user device, gathering 1204 information describing current activity associated with the user device, and determining 1206 whether the user device has deviated from normal activity.
- the example method depicted in Fig. 14 also includes initiating 1406 a remediation workflow after determining that the user device has deviated from normal activity.
- Initiating 1406 a remediation workflow may be carried out as part of an auto-remediation capability that may be provided to the monitored devices.
- the remediation workflow may be configured to perform a variety of tasks including, for example, restricting the access of the user device to certain data, restricting the usage of certain applications on the user device, enabling some feature on the user device (e.g., communications over an unsecured network is detected so a data encryption feature for data communications is enabled), or performing some other function.
- the user-specific polygraph 1500 depicted in Fig. 15A also includes information describing applications accessed by the device, depicted here as internal sales application 1508, web browser application 1510, ad messaging application 1512.
- Information describing the applications that are accessed by the user device may be obtained as described in greater detail above.
- the representation of the applications may include, for example, a name of the application, a name of the binary, a custom label for the application, or any other information associated with the application including information describing the usage of the application.
- Such application-related information may be visible in the original presentation of the polygraph or may be accessible in other ways (e.g., hovering a mouse over the user icon).
- the links between the web browser application 1510 and the endpoints 1516, 1518, 1520, 1522 may include information describing how long the connection has been established, how much data has been transferred since the connection was established, and so on.
- other data associated with the links between two icons may also be enriched with data that may be visible in the default view of the polygraph or accessible in some other way as described above.
- the user-specific polygraph 1600 depicted in Fig. 16 also includes information describing remotely disposed or executed applications accessed by depicted user devices (e.g., by the primary device 1608 and unknown device 1610), depicted here as application 1620 and 1622.
- information describing the applications that are accessed by the user device may be obtained by accessing logs maintained by the applications. Such logs may be accessed via an Application Program Interface (API), received via an event streaming platform (e.g., Kafka), received from particular agents, and the like.
- API Application Program Interface
- Information describing the applications may also be generated and received from a distributed edge platform, proxy, or other intermediary for user network activity.
- Application-related information may be visible in the original presentation of the polygraph or may be accessible in other ways (e.g., hovering a mouse over the user icon).
- the application 1620 and application 1622 may include private applications. Private applications are those applications executed in a data center or other execution environment associated with the organization and are not maintained by a third-party service provider as with SaaS applications.
- the user-specific polygraph 1700 depicted in Fig. 17 describes network activity with respect to a user for remotely accessing particular documents, to be described in further detail below.
- the user-specific polygraph 1700 depicted in Fig. 17 includes a representation of a user and information describing the user, in this case designated as User ABC 1702.
- the user-specific polygraph 170 also includes information relating to the times at which particular network activity occurred, shown as work hours 1704.
- the user-specific polygraph 1700 depicted in Fig. 17 also includes information describing the location of the user (as represented by the location of the device that is being used by the user), depicted here as San Francisco, CA 1706.
- the userspecific polygraph 1700 depicted in Fig. 17 also includes information describing a user device used by a user, depicted here as primary device 1708.
- the user-specific polygraph 1700 depicted in Fig. 17 also includes a representation of an executable used to access remote documents, shown here as file browser application 1710.
- file browser application 1710 One skilled in the art will appreciate that other types of executables may also be used to access documents, including a web browser for a web-based interface, a file transfer client such as a File Transfer Protocol (FTP) client, a version control system client, or other executables as can be appreciated.
- FTP File Transfer Protocol
- the user-specific polygraph 1700 depicted in Fig. 17 also includes information describing types of access performed by the user. Here, the access types are grouped both by rarity (e.g., “common” or “rare,”) but also by whether a read or a write was performed.
- the access types include common read 1712, common write 1714, rare read 1716, and rare write 1718.
- the user-specific polygraph 1700 depicted in Fig. 17 also includes information describing documents accessed by the user, shown as portable document format (PDF) document 1720, word processing document 1722, and website source document 1724. As shown in the user-specific polygraph 1700, the user is shown to have performed common reads on the PDF document 1720 and word processing document 1722 and common writes on the word processing document 1722. The user also is shown to have performed rare reads and rare writes on the website source document 1724.
- PDF portable document format
- an agent or other entity on the user device may provide information describing activity of the user device, including information with respect to access to shadow applications, to facilitate generation of the user-specific polygraph 1800.
- the particular shadow applications accessed by the user may then be determined by a variety of approaches. For example, domain names, network addresses (e.g., internet protocol (IP) addresses, or other data indicated in the information may be compared to a database, table, or other resource associating particular applications with particular data points in the information.
- IP internet protocol
- the user-specific polygraph 1900 depicted in Fig. 19 evaluates the risk of accessing particular websites by the user.
- Information describing network activity with respect to browsing particular websites may be gathered, for example, from particular user devices, from agents executed on the particular user devices, from distributed edge platforms or proxies serving as intermediaries to the user devices, and the like.
- the user-specific polygraph 1900 depicted in Fig. 19 includes a representation of a user and information describing the user, in this case designated as User ABC 1902.
- the user-specific polygraph 1900 depicted in Fig. 19 also includes information describing times during which access to particular websites by the user occurred, shown as work hours 1904, intermittent hours 1906, and rare hours 1908.
- the user-specific polygraph 1900 depicted in Fig. 19 also includes information describing the location of the user (as represented by the location of the device that is being used by the user), depicted here as San Francisco, CA 1910.
- the user-specific polygraph 1900 depicted in Fig. 19 also includes information describing user devices used by a user, depicted here as primary device 1912 and unknown device 1914.
- the user-specific polygraph 1900 depicted in Fig. 19 also includes representations of executables used to access particular websites, shown here as web browser application 1916 and web browser application 1918.
- the user-specific polygraph 1900 depicted in Fig. 19 includes representations for grouping websites into different categories of risk, shown as low risk 1920 and high risk 1924.
- low risk 1920 websites include travel site 1926 and e-commerce site 1928.
- High risk 1924 websites include news site 1930 and software distribution site 1932.
- generating the user-specific polygraph 1 00 includes calculating, for one or more websites to be included in the user-specific polygraph 1900, a risk score.
- Normal browsing activity for the user may be established by various approaches set forth above, including training a model based on browsing activity for the user.
- the risk associated with browsing a particular website may include providing information describing the user accessing that website to a trained model.
- the model may output, for example, a degree of deviation associated with the user that then may be provided as input to another model or to an algorithm for calculating a risk score for accessing the website.
- the model may also output the risk score itself.
- the user-specific polygraphs of Figs. 16-19 may also include alerts in response to various criteria. For example, where a network activity deviates from normal activity for the user (e.g., as described above using trained models, periodically retrained models, and the like), an alert may be presented inline with the user-specific polygraph. Additional remedial actions may also be taken in response to determining that the network activity for the user deviates from normal activity for a user, including generating additional alerts, initiating a remediation workflow, and other actions as can be appreciated.
- Fig. 20 sets forth a flowchart of an example method for generating user-specific polygraphs for network activity according to some embodiments of the present disclosure.
- the method of Fig. 20 includes gathering 2002 information describing network activity associated with a user.
- the information describing network activity associated with a user may describe various destinations or resources accessed by the user, such as particular applications (e.g., SaaS applications, private applications, shadow applications), documents, websites, and the like.
- the information describing network activity associated with the user may describe particular types of access, such as reads or writes, amounts of data transferred, durations of network connections, and the like.
- the information describing network activity associated with the user may also include a time at which the network activity occurred.
- the time may be expressed as a particular point in time during which a network connection was established or ended, or a particular point in time in which a particular transfer of data occurred.
- the time may also be expressed as a time range, such as a duration of a session or network connection, a duration of a data transfer, and the like.
- the information describing network activity associated with the user may also include a location of the user corresponding to the activity, or various data points from which a location of the user may be determined according to the approaches set forth above.
- the information describing network activity associated with the user may also include identifiers for particular devices or executables used by the user during the network activity.
- the information describing network activity associated with the user is gathered from a destination of the network activity.
- the information may include logs generated by an application (e.g., a SaaS application or a private application) accessed by the user. Such logs may be provided by the application via an exposed API, by an event streaming platform, by an agent, or by another approach as can be appreciated.
- the information describing the network activity may be gathered from an intermediary for network traffic between a user device and the destination.
- a distributed edge platform, proxy, or other network traffic intermediary may store and provide information describing network activity associated with the user.
- Such an intermediary may be executed on the user device itself, or on another computing device or execution environment in communication with the user device.
- the method of Fig. 20 also includes generating 2004, based on the information, a user specific polygraph that includes one or more destinations associated with the network activity.
- the user-specific polygraph includes the one or more destinations in that the user-specific polygraph includes one or more representations for the one or more destinations.
- the user specific polygraph may include, for example, a user-specific polygraph as described above.
- the one or more destinations may share a same destination type (e.g., SaaS applications, private applications, documents, shadow applications, websites, and the like).
- the user-specific polygraph may include representations for one or more access types for the network activity of the user.
- the access types may differ according to frequency or rarity of access (e.g., common access, rare access, and the like).
- the access types may differ according to a particular type of operation performed (e.g., read, write).
- the access types differ according to a combination of rarity and type of operation (e.g., common read, common write, rare read, rare write).
- rarity e.g., common read, common write, rare read, rare write
- the user-specific polygraph may include representations for particular executables used by the user in order to access the particular destinations.
- these representations may be generated or determined according to similar approaches as are set forth above.
- generating the user-specific polygraph includes encoding particular attributes into the user-specific polygraph corresponding to the corresponding particular representations included in the user-specific polygraph, or corresponding to particular links in the user-specific polygraph.
- these attributes are presented or displayed in the polygraph in response to a user interaction with a particular representation or link (e.g., clicking, hovering, and the like), thereby providing a more detailed view of particular elements as desired by a user.
- these attributes are presented or displayed without requiring such user interaction.
- the user-specific polygraph may include other information as is described above, different information, less information, and so forth. Such varying combinations of information and features are also contemplated within the scope of the present disclosure.
- Network activity for the website may diverge from normal based on attributes such as a time of accessing the website, a device used to access the website, a location from which the website was accessed, a frequency at which the website is generally accessed, an executable (e.g., a browser) used to access the website, and other criteria as can be appreciated.
- attributes such as a time of accessing the website, a device used to access the website, a location from which the website was accessed, a frequency at which the website is generally accessed, an executable (e.g., a browser) used to access the website, and other criteria as can be appreciated.
- the risk for a particular website may be based on site-specific risk attributes.
- site-specific risk attributes may include knowledge that the website distributes malware, that accounts associated with the website have suffered a data breach, that the website is stored in a particular geographic location, that the website is controlled by a known malicious party, and other attributes as can be appreciated.
- site-specific risk attributes may be determined by accessing a database or third-party website tracking such risk attributes, by performing scans on the website, and by other approaches.
- the method of Fig. 22 further differs from Fig. 20 in that generating 2004 the userspecific polygraph also includes including 2204, in the user-specific polygraph, an alert that the network activity deviates from the normal activity for the user.
- the alert may include, pop-up notifications, windows, or other user interface elements as can be appreciated that indicate the deviation.
- these alerts may be anchored to particular representations or links in the user-specific polygraph in order to indicate particular factors that contribute to the deviations from normal activity.
- a method of generating user-specific polygraphs for network activity comprising: gathering information describing network activity associated with a user and generating, based on the information, a user-specific polygraph that includes one or more destinations associated with the network activity.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
L'invention concerne la génération de polygraphes spécifiques à l'utilisateur pour une activité de réseau, comprenant : la collecte d'informations décrivant une activité de réseau associée à un utilisateur et la génération, sur la base des informations, d'un polygraphe spécifique à l'utilisateur qui comprend une ou plusieurs destinations associées à l'activité de réseau.
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163240818P | 2021-09-03 | 2021-09-03 | |
| US17/810,978 US11916947B2 (en) | 2017-11-27 | 2022-07-06 | Generating user-specific polygraphs for network activity |
| PCT/US2022/042247 WO2023034444A1 (fr) | 2021-09-03 | 2022-08-31 | Génération de polygraphes spécifiques à l'utilisateur pour une activité de réseau |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP4397002A1 true EP4397002A1 (fr) | 2024-07-10 |
Family
ID=91435094
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP22778115.0A Pending EP4397002A1 (fr) | 2021-09-03 | 2022-08-31 | Génération de polygraphes spécifiques à l'utilisateur pour une activité de réseau |
Country Status (1)
| Country | Link |
|---|---|
| EP (1) | EP4397002A1 (fr) |
-
2022
- 2022-08-31 EP EP22778115.0A patent/EP4397002A1/fr active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11916947B2 (en) | Generating user-specific polygraphs for network activity | |
| US11909752B1 (en) | Detecting deviations from typical user behavior | |
| US11895135B2 (en) | Detecting anomalous behavior of a device | |
| US11741238B2 (en) | Dynamically generating monitoring tools for software applications | |
| US20230254330A1 (en) | Distinguishing user-initiated activity from application-initiated activity | |
| US20230075355A1 (en) | Monitoring a Cloud Environment | |
| US20220360600A1 (en) | Agentless Workload Assessment by a Data Platform | |
| US12526297B2 (en) | Annotating changes in software across computing environments | |
| US12095879B1 (en) | Identifying encountered and unencountered conditions in software applications | |
| US20240106846A1 (en) | Approval Workflows For Anomalous User Behavior | |
| US12309181B1 (en) | Establishing a location profile for a user device | |
| US20230328086A1 (en) | Detecting Anomalous Behavior Using A Browser Extension | |
| US12095794B1 (en) | Universal cloud data ingestion for stream processing | |
| US12309236B1 (en) | Analyzing log data from multiple sources across computing environments | |
| US12261866B1 (en) | Time series anomaly detection | |
| US12363148B1 (en) | Operational adjustment for an agent collecting data from a cloud compute environment monitored by a data platform | |
| WO2024044053A1 (fr) | Évaluation et remédiation de scénario de risque de ressources en nuage | |
| WO2023038957A1 (fr) | Surveillance d'un pipeline de développement logiciel | |
| WO2023034419A1 (fr) | Détection de comportement anormal d'un dispositif | |
| US12463994B1 (en) | Handling of certificates by intermediate actors | |
| US12463995B1 (en) | Tiered risk engine with user cohorts | |
| US12483576B1 (en) | Compute resource risk mitigation by a data platform | |
| US12463997B1 (en) | Attack path risk mitigation by a data platform using static and runtime data | |
| US12463996B1 (en) | Risk engine that utilizes key performance indicators | |
| US12355787B1 (en) | Interdependence of agentless and agent-based operations by way of a data platform |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20240208 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| DAV | Request for validation of the european patent (deleted) | ||
| DAX | Request for extension of the european patent (deleted) |