WO2019228613A1 - Device and method for detecting malicious domain names - Google Patents
Device and method for detecting malicious domain names Download PDFInfo
- Publication number
- WO2019228613A1 WO2019228613A1 PCT/EP2018/064092 EP2018064092W WO2019228613A1 WO 2019228613 A1 WO2019228613 A1 WO 2019228613A1 EP 2018064092 W EP2018064092 W EP 2018064092W WO 2019228613 A1 WO2019228613 A1 WO 2019228613A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- fqdn
- malicious
- sequence
- domain
- result
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/145—Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
- H04L63/0236—Filtering by address, protocol, port number or service, e.g. IP-address or URL
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
- H04L63/0245—Filtering by information in the payload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
- H04L63/0263—Rule management
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2101/00—Indexing scheme associated with group H04L61/00
- H04L2101/30—Types of network names
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2463/00—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
- H04L2463/144—Detection or countermeasures against botnets
Definitions
- the present invention relates generally to malware detection, particularly to the detection of malicious domain names. Especially, the invention is about identifying malicious domain names produced by a Domain Generation Algorithm (DGA). To this end, the present invention proposes a device, system and method for respectively detecting the malicious domain names.
- DGA Domain Generation Algorithm
- malwares execute an algorithm generating a large number (up to tens-of- thousands per day) of possible domain names, and attempt to connect to a portion of these generated domains until finding a working server.
- Each DGA algorithm uses its different grammar and different seeding mechanism (time, currency exchange rate and more).
- Some DGAs uses combinations of known (e.g. English) words (e.g. abobehaven.net, actionfight.net, etc.).
- Some DGAs are purposely collisions with benign domains (wdmlmofa.net, yahoo.com, finlwx.com).
- Shallow machine learning based methods such as a combination of clustering and classification algorithms. These methods use large sets of benign and malicious domains, in order to build a domain classifier.
- DNN Deep Neural Network
- the first Recursive Neural Network (RNN) based implementation of DGA detection proposed a one-hot based, one directional RNN using domain information only.
- DNN-based RNN and Convolutional Neural Network (CNN) models were also compared with shallow learning Random Forest models.
- the present invention aims to improve the conventional methods and the mentioned techniques.
- the present invention has the objective to provide a device and method that are able to detect malicious domain names with a higher detection rate. In particular, they should be able to detect accurately even previously unseen DGAs. Furthermore, the detection of false positives should be reduced.
- A“public suffix” is a domain name, under which internet users can (or historically could) directly register their own domain names (i.e. pvt.kl2.ma.us).
- A“public suffix list” is an initiative of Mozilla, but is maintained as a community resource. It allows browsers to, for example to:
- the usage of a public suffix allows learning separately the context“ of the subdomain, and obtaining the plausiblebias“ for the public suffix (e.g. for FQDN sdlsjdkjks.dydns.com the separation of the subdomain and the public suffix will create two outputs: sdlsjdkjks, dydns.com. This allows learning separately the context“ model of sdlsjdkjks and the probability of dydns.com to be used by DGA).
- the subdomain can be omitted from the prediction (e.g. for FQDN kdsksksue.cdn.google.com and the output will be google, com, since cdn.google.com is not a public suffix).
- the present invention thus proposes detecting malicious domain names based on a public suffix. Further, the present invention employs particularly a deep neural network model built for processing of domain name and public suffix separately.
- a first aspect of the present invention provides a device for detecting malicious domain names, the device being configured to receive, as an input, a FQDN and a public suffix index, determine a public suffix sequence and a domain characters sequence in the FQDN based on the public suffix index, process the public suffix sequence to obtain a first result indicative of whether the FQDN is malicious or not, process the domain characters sequence to obtain a second result indicative of whether the FQDN is malicious or not, and merge the first result and the second result and determine based on the merged result whether the FQDN is malicious or not.
- the detection accuracy is much improved.
- even domain names generated by DGA can be detected more accurately, and particularly with less false positives.
- the efficiency of the device is significantly improved. This is, because the separation itself requires only little processing, and also the calculation of the result based on the public suffix sequence is not complex.
- the domain characters sequence is thus as short as possible, i.e. the necessary processing is reduced.
- the device comprises a first Long Short-Term Memory (LSTM) network for processing the public suffix sequence, and/or a second LSTM network for processing the domain characters sequence.
- LSTM Long Short-Term Memory
- the first LSTM network and/or the second LSTM network is a Recurrent Neural Network.
- Such RNNs are optimal for the algorithm provided by the device of the first aspect. They can efficiently process the two sequences separately. Thereby, they can be individually trained to reach higher detection accuracy.
- the device for processing the public suffix sequence, is configured to compute a probability that the public suffix sequence and the domain character sequence are used for a malicious FQDN based on determined previous events.
- the device is further configured to compute a probability that the public suffix sequence is used by a DGA.
- the device of the first aspect is particularly suitable for detecting malicious domain names generated by DGAs.
- the device is configured to receive, as an input, a training set for learning the determined previous events.
- the device for processing the domain characters sequence, is configured to calculate a probability that the domain characters sequence is used for a malicious FQDN based on a likelihood of one or more next characters in the sequence.
- the device for determining whether the FQDN is malicious or not, is configured to classify the merged result. By using such a classification, the final determination of whether the domain name is malicious or not can be carried out accurately and fast.
- a second aspect of the present invention provides a system for detecting malicious domain names, the system comprising a monitoring device configured to monitor incoming DNS traffic and determine one at least one FDQN from the incoming DNS traffic, and a device according to the first aspect or any of its implementation forms to determine whether the determined FQDN is malicious or not.
- the system of the second aspect achieves all advantages and effects of the device of the first aspect and its implementation forms.
- This system of the second aspect can be implemented, for instance, in a host intrusion detection system, and can provide higher security.
- the system is configured to, after a number of FQDNs has been determined to be malicious, wherein the number is above a determined threshold number, block a process that is an origin of the incoming DNS traffic, from which the FQDNs were determined, or output an alert message.
- a third aspect of the present invention provides a method for detecting malicious domain names, the method comprising receiving, as an input, a FQDN and a public suffix index, determining a public suffix sequence and a domain characters sequence in the FQDN based on the public suffix index, processing the public suffix sequence to obtain a first result indicative of whether the FQDN is malicious or not, processing the domain characters sequence to obtain a second result indicative of whether the FQDN is malicious or not, and merging the first result and the second result and determining based on the merged result whether the FQDN is malicious or not.
- the method comprises processing the public suffix sequence with a LSTM network, and/or processing the domain characters sequence with a second LSTM network.
- the first LSTM network and/or the second LSTM network is a RNN.
- the method determines that the FQDN does not include any public suffix sequence, the method further comprises taking the FQDN and omitting the processing of any sub-domain characters sequence of the FQDN.
- the method for processing the public suffix sequence, the method comprises computing a probability that the public suffix sequence and the domain character sequence are used for a malicious FQDN based on determined previous events.
- the method further comprises computing a probability that the public suffix sequence is used by a DGA.
- the method comprises receiving, as an input, a training set for learning the determined previous events.
- the method comprises calculating a probability that the domain characters sequence is used for a malicious FQDN based on a likelihood of one or more next characters in the sequence.
- the method comprises classifying the merged result.
- a fourth aspect of the present invention provides a computer program product comprising program code for controlling a device according to the first aspect or any of its implementation forms, or for performing, when implemented on a processor, a method according to the third aspect or any of its implementation forms.
- the computer program product may be a data carrier carrying the program code or may be a hardware storage device or the like. It has to be noted that all devices, elements, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities.
- FIG. 1 shows a device according to an embodiment of the present invention.
- FIG. 2 shows a device according to an embodiment of the present invention.
- FIG. 3 shows a device according to an embodiment of the present invention.
- FIG. 4 shows a performance of a device according to an embodiment of the present invention.
- FIG. 5 shows a detection rate for several DGAs achieved by a device according to an embodiment of the present invention.
- FIG. 6 shows a system according to an embodiment of the present invention.
- FIG. 7 shows an integration of a system according to an embodiment of the present invention into a host intrusion detection system.
- FIG. 8 shows a clout botnet detection service including a device according to an embodiment of the present invention.
- FIG. 9 shows a method according to an embodiment of the present invention.
- FIG. 1 shows a device 100 according to an embodiment of the present invention.
- the device 100 is especially suited for detecting malicious domain names, particularly generated by DGA, and thus for identifying DGAs.
- the device 100 may comprise at least one processor and/or at least one LSTM network configured for implementing functions (a detection algorithm) described in the following. Thereby the at least one LSTM network may be implemented by processing circuitry.
- the device 100 is configured to receive, as an input, a FQDN 101 and a public suffix index 102.
- the public suffix index 102 may also be referred to as a public suffix list. Further, the device
- the device 100 is configured to determine a public suffix sequence 103 and a domain characters sequence 104 in the FQDN 101 based on the public suffix index 102. In other words, the device 100 can extract from the FQDN 101 the public suffix sequence 103 as a first part, and the domain character sequence 104 as a second part. These parts of the FQDN 101 can then be processed separately by the device 100.
- the device 100 is configured to process the public suffix sequence 103 to obtain a first result 105 indicative of whether the FQDN 101 is malicious or not, and to process the domain characters sequence 104 to obtain a second result 106 indicative of whether the FQDN
- the device 100 can comprise at least one LSTM network to carry out the processing.
- An LSTM network may be, for instance, a RNN or CNN.
- the device 100 may be configured to compute a probability that the public suffix sequence 103 or the public suffix sequence 103 and the domain character sequence 104 are used for a malicious FQDN 101, based on determined previous events like a history recording. For instance, the more often a public suffix sequence 103 was already used for a malicious FQDN 101, the higher the probability that it is again used maliciously.
- the device 100 may be configured to calculate a probability that the domain characters sequence 104 is used for a malicious FQDN 101 based on a likelihood of one or more next characters in the domain characters sequence 104. For instance, the lower the likelihood of the one or more next characters, the higher the probability that the domain characters sequence 104 is used maliciously.
- the device 100 is configured to merge the first result 105 and the second result 106 to obtain a merged result 107. Based on the merged result 107, the device 100 is configured to determine as an end result, whether the FQDN 101 is malicious or not. When merging the first result 105 and the second result 106, the device 100 may also be configured to weight the results.
- FIG. 2 shows a device 100 according to an embodiment of the present invention, which builds on the device 100 shown in FIG. 1. Same elements in FIG. 1 and FIG. 2 are labelled with same reference signs and function likewise. Accordingly, also the device 100 of FIG. 2 is configured to receive the public suffix index 102 and the FQDN 101, respectively, and to determine in a two-step process, whether the FQDN 101 is malicious or not. For the two-step process, the device 100 particularly uses two different paths 202, 203 in a deep learning model, particularly two different LSTM networks.
- FIG. 2 shows particularly that the public suffix index 102 and the FQDN 101 are input to a unit 200 of the device 100 configured for top level domain extraction.
- This extraction unit 200 yields the domain characters sequence number 104 and the public suffix sequence 103, which is here referred to as a public suffix array.
- the public suffix sequence 103 then takes a first path through e.g. a first LSTM 202, which yields the first result 105.
- the domain character sequence 104 takes a second path through e.g. a second LSTM 203, which yields the second result 106.
- These two results 105 and 106 are the merged in a merge layer 204 of the device 100, which yields the merged result 107.
- the deep learning model takes the decision, whether the input FQDN 101 is malicious or not.
- the LSTMs 202 and 203 and the merge layer 204 are part of the deep learning model.
- FIG. 3 shows a device 100 according to an embodiment of the present invention, which builds on the device 100 shown in FIG. 2. Same elements in FIG. 2 and FIG. 3 are labelled with the same reference signs and function likewise. In particular, FIG. 3 shows the deep learning model of the device 100 shown in FIG. 2 in more detail.
- the deep learning model consists particularly of the two separate LSTM networks 202 and 203.
- the first LSTM 203 is for the processing of the domain characters sequence 104 (e.g. kmcokkdoqwvfgk) and the second LSTM 202 is for the processing of the public suffix sequence 103 or array (e.g. the public suffix act.edu.au is represent by the array [,act‘, ,edu‘, ,au‘]).
- the respective results 105 and 106 are merged in the merge concatenation layer 204, and are processed by fully connected layers 306 and 308, respectively.
- a first fully concatenated layer 306 processes the output of the first LSTM 202 to produce the result 105
- a second fully concatenated layer 308 processes the merged result 107.
- the output of the device 100 is then predicted, i.e. whether the FQDN 101 is malicious or not.
- the deep learning model has, for instance, been trained on 1M Alexa Index of most popular sites, DMOZ index with more than 3M manually edited non-malicious domains and about 1M of DGA samples taken from Open-source intelligence (OSINT) and DGArchive (DGArchive. caad. fkie. fraunho fer.de) .
- OSINT Open-source intelligence
- DGArchive DGArchive. caad. fkie. fraunho fer.de
- FIG. 4 compares the performance of a device 100 according to the present invention, which particularly implements the above-described deep learning model, with a device implementing a conventional model (e.g. an algorithm presented by Norwegian Computing Center). It can be seen that the device 100 according to an embodiment of the present invention shows a significantly improved performance with respect to both validation accuracy and validation loss over the conventional device. That is, the validation accuracy of the device 100 is considerably higher than of the conventional device, while its validation loss is considerably lower.
- a conventional model e.g. an algorithm presented by Norwegian Computing Center
- FIG. 5 shows a detection rate achieved by a device 100 according to an embodiment of the present invention for several DGAs.
- the table shown in FIG. 5 names the various DGAs, describes them shortly, and demonstrates that a probability of the device 100 on detection of unseen domains produced with the various DGAs is consistently high. Further, a probability for false positives on non-malicious domains from unseen sources is low.
- FIG. 6 shows a system 600 according to an embodiment of the present invention.
- the system 600 is particularly for detecting malicious domain names, especially malicious domain names generated by a DGA.
- the system 600 comprises a monitoring device 601 configured to monitor incoming DNS traffic 602 and to determine at least one FDQN 101 from the incoming DNS traffic 602.
- the system 600 also comprises a device 100 according to an embodiment of the present invention, as for example shown in FIG. 1, 2 or 3.
- the device 100 is configured to determine, whether the determined FQDN 101 is malicious or not. This determination is achieved with the two-step process explained above.
- FIG. 7 shows that the system 600 (and thus respectively also the device 100) may be implemented in, or may even be, a host intrusion detection system (HIDS).
- the HIDS may be a Cloud Service provided to consumers.
- the HIDS may be composed of several plugins running on a common agent-based platform at the side of a Guest Virtual Machine (VM).
- the DGA plugin i.e. the system 600, may run on top of the HIDS Agent platform, and may passively sniff the DNS traffic 602. Once a new DNS lookup is detected, the FQDN 101 is sent to a Cloud Botnet Detection Service, which includes the device 100. If the Cloud Botnet Detection Service detects a malicious domain, the HIDS waits for certain threshold (e.g.
- the system 600 in the HIDS is configured to, after a number of FQDNs 101 has been determined to be malicious, wherein the number is above a determined threshold number, block a process that is an origin of the incoming DNS traffic 602, from which the FQDNs 101 were determined, or output an alert message.
- FIG. 8 shows a Cloud Botnet Detection Service, e.g. the one used in the system 600 of FIG. 7, including the device 100.
- the Cloud Botnet Detection Service may be part of a Galaxy Big Data and AI platform. Galaxy is responsible for data aggregation from multiple sources and its processing (including model building).
- a DGA Feed Aggregation component is responsible for data aggregation both for proven benign domains (e.g. Alexa, DMOZ, Huawei DNSaaS) and malicious domains (e.g. DGArchive, Malwaredomainlist, OSINT and more).
- the aggregated data is stored in Big Data platform.
- a botnet detection service implemented by or including the device 100 according to an embodiment of the present invention, is responsible for periodic training of the model described above. The trained model is used for inference for domain lists that are coming from HIDS agents.
- FIG. 9 shows a method 900 according to an embodiment of the present invention.
- the method 900 may be carried out by a device 100 according to an embodiment of the present invention, as e.g. shown in FIG. 1, 2 or 3, or a system 600 as shown in FGI. 6 or FIG. 7.
- the method 900 comprises a step 901 of receiving, as an input, a FQDN 101 and a public suffix index 102. Further, it comprises a step 902 of determining a public suffix sequence 103 and a domain characters sequence 104 in the FQDN 101 based on the public suffix index 103. Further, it comprises a step 902 of processing the public suffix sequence 103 to obtain a first result 105 indicative of whether the FQDN 101 is malicious or not.
- step 904 of processing the domain characters sequence 104 to obtain a second result 106 indicative of whether the FQDN 101 is malicious or not.
- step 905 of merging the first result 105 and the second result 106 and determining based on the merged result 107 whether the FQDN 101 is malicious or not.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Security & Cryptography (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Hardware Design (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Business, Economics & Management (AREA)
- Business, Economics & Management (AREA)
- Virology (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The present invention relates to the detection of malicious domain names, particularly generated by a Domain Generation Algorithm. Therefore, the present invention provides a device, system, and method. The device is configured to receive, as an input, a Fully-Qualified Domain Name (FQDN) and a public suffix index. The device can determine a public suffix sequence and a domain characters sequence in the FQDN based on the public suffix index. Then, the device is configured to process the public suffix sequence to obtain a first result indicative of whether the FQDN is malicious or not, to process the domain characters sequence to obtain a second result indicative of whether the FQDN is malicious or not, and to merge the first result and the second result and determine based on the merged result whether the FQDN is malicious or not
Description
DEVICE AND METHOD FOR DETECTING MALICIOUS DOMAIN NAMES
TECHNICAL FIELD The present invention relates generally to malware detection, particularly to the detection of malicious domain names. Especially, the invention is about identifying malicious domain names produced by a Domain Generation Algorithm (DGA). To this end, the present invention proposes a device, system and method for respectively detecting the malicious domain names.
BACKGROUND
Many Botnets, Trojans and other new malware families use DGAs to generate a large number of domain names to connect to a command and control (C&C) server. Older families of malware relied on static lists of domains or IP addresses that were hardcoded in the malware code running on the infected hosts. Once a given malware was discovered, it could then be neutralized by blocking the connections to these network addresses, in order to prevent further communications between the infected hosts and the C&C server. However, starting from the Kraken botnet (released in 2008), the newer families of malware started using DGAs to circumvent such takedown attempts. Instead of relying on a fixed list of domains or IP addresses, these malwares execute an algorithm generating a large number (up to tens-of- thousands per day) of possible domain names, and attempt to connect to a portion of these generated domains until finding a working server.
Detecting and blocking such newer malware families using DGAs presents several challenges:
• Each DGA algorithm uses its different grammar and different seeding mechanism (time, currency exchange rate and more). · Some DGAs uses combinations of known (e.g. English) words (e.g. abobehaven.net, actionfight.net, etc.).
• Some DGAs are purposely collisions with benign domains (wdmlmofa.net, yahoo.com, finlwx.com).
• The frequency of Domain Name System (DNS) lookup query could vary significantly There are several possible techniques to identify malicious domains:
• Domain blacklisting, which is a fully reactive approach with an almost zero rate of false positives.
• Heuristic approaches for identifying DGA by modelling their lexical structures or query points to a non-existent domain. These heuristic approaches require data accumulation over large time windows, and cannot really help with real-time malware detection.
• Shallow machine learning based methods, such as a combination of clustering and classification algorithms. These methods use large sets of benign and malicious domains, in order to build a domain classifier.
• Deep Neural Network (DNN) based algorithms. These algorithms show the best performance and accuracy:
■ The first Recursive Neural Network (RNN) based implementation of DGA detection proposed a one-hot based, one directional RNN using domain information only.
■ This implementation was then extended by implementing a bidirectional RNN, and adding dense feed-forward layer and predicting the type of DGA (for instance, Suppobox).
■ DNN-based RNN and Convolutional Neural Network (CNN) models were also compared with shallow learning Random Forest models.
However, all of these techniques (including DNN) are poorly generalized to unseen DGAs and, basically, are effective only in identification of formerly seen attacks. There are several types of DGAs, which cannot be identified by these techniques, even if present in the training set.
In summary, all techniques have the limitation that for previously undetectable and unseen DGAs they have very low detection results.
SUMMARY
In view of the above-mentioned challenges, the present invention aims to improve the conventional methods and the mentioned techniques. The present invention has the objective to provide a device and method that are able to detect malicious domain names with a higher detection rate. In particular, they should be able to detect accurately even previously unseen DGAs. Furthermore, the detection of false positives should be reduced.
The objective of the present invention is achieved by the solution provided in the enclosed independent claims. Advantageous implementations of the present invention are further defined in the dependent claims. The invention generally bases on the realization that a public suffix could be helpful for DGA identification.
A“public suffix” is a domain name, under which internet users can (or historically could) directly register their own domain names (i.e. pvt.kl2.ma.us).
A“public suffix list” is an initiative of Mozilla, but is maintained as a community resource. It allows browsers to, for example to:
• Avoid privacy-damaging "supercookies" being set for high-level domain name suffixes.
• Highlight the most important part of a domain name in the user interface.
• Accurately sort history entries by site.
There are two major factors that influence DGA identification accuracy: · Many DGAs are hiding behind known domains as a subdomain: (e.g. dydns.org, mooo.com and others).
• Many web applications/services are using pseudo-random subdomains for their own needs (kdsksksue . cdn. google . com) .
For the first use-case, the usage of a public suffix allows learning separately the„language“ of the subdomain, and obtaining the „bias“ for the public suffix (e.g. for FQDN sdlsjdkjks.dydns.com the separation of the subdomain and the public suffix will create two
outputs: sdlsjdkjks, dydns.com. This allows learning separately the„language“ model of sdlsjdkjks and the probability of dydns.com to be used by DGA).
For the second use-case, the subdomain can be omitted from the prediction (e.g. for FQDN kdsksksue.cdn.google.com and the output will be google, com, since cdn.google.com is not a public suffix).
In particular the present invention thus proposes detecting malicious domain names based on a public suffix. Further, the present invention employs particularly a deep neural network model built for processing of domain name and public suffix separately.
A first aspect of the present invention provides a device for detecting malicious domain names, the device being configured to receive, as an input, a FQDN and a public suffix index, determine a public suffix sequence and a domain characters sequence in the FQDN based on the public suffix index, process the public suffix sequence to obtain a first result indicative of whether the FQDN is malicious or not, process the domain characters sequence to obtain a second result indicative of whether the FQDN is malicious or not, and merge the first result and the second result and determine based on the merged result whether the FQDN is malicious or not.
By calculating separately the first result and the second result, and by then merging these two results to determine whether the domain name is malicious or not, the detection accuracy is much improved. In particular, even domain names generated by DGA can be detected more accurately, and particularly with less false positives. Furthermore, by separating the domain name based on the public suffix index into the public suffix sequence and the domain characters sequence, the efficiency of the device is significantly improved. This is, because the separation itself requires only little processing, and also the calculation of the result based on the public suffix sequence is not complex. Moreover, the domain characters sequence is thus as short as possible, i.e. the necessary processing is reduced. In an implementation form of the first aspect, the device comprises a first Long Short-Term Memory (LSTM) network for processing the public suffix sequence, and/or a second LSTM network for processing the domain characters sequence.
Using two such LSTM networks on the respective sequences yields an efficient and accurate detection of the malicious domain names.
In a further implementation form of the first aspect, the first LSTM network and/or the second LSTM network is a Recurrent Neural Network.
Such RNNs are optimal for the algorithm provided by the device of the first aspect. They can efficiently process the two sequences separately. Thereby, they can be individually trained to reach higher detection accuracy.
In a further implementation form of the first aspect, for processing the public suffix sequence, the device is configured to compute a probability that the public suffix sequence and the domain character sequence are used for a malicious FQDN based on determined previous events.
This computation of the probability based on the previous events requires only little processing load, but is rather accurate.
In a further implementation form of the first aspect, the device is further configured to compute a probability that the public suffix sequence is used by a DGA.
Thus, the device of the first aspect is particularly suitable for detecting malicious domain names generated by DGAs. In a further implementation from the first aspect, the device is configured to receive, as an input, a training set for learning the determined previous events.
This allows the device of the first aspect to operate with an even higher detection accuracy. In particular, false positive detections can be better avoided.
In a further implementation form of the first aspect, for processing the domain characters sequence, the device is configured to calculate a probability that the domain characters sequence is used for a malicious FQDN based on a likelihood of one or more next characters in the sequence.
This leads to more accurate results. Further, since the domain characters sequence is as short as possible, the device efficiency is high. In a further implementation form of the first aspect, for determining whether the FQDN is malicious or not, the device is configured to classify the merged result.
By using such a classification, the final determination of whether the domain name is malicious or not can be carried out accurately and fast.
A second aspect of the present invention provides a system for detecting malicious domain names, the system comprising a monitoring device configured to monitor incoming DNS traffic and determine one at least one FDQN from the incoming DNS traffic, and a device according to the first aspect or any of its implementation forms to determine whether the determined FQDN is malicious or not.
Accordingly, the system of the second aspect achieves all advantages and effects of the device of the first aspect and its implementation forms. This system of the second aspect can be implemented, for instance, in a host intrusion detection system, and can provide higher security.
In an implementation form of the second aspect, the system is configured to, after a number of FQDNs has been determined to be malicious, wherein the number is above a determined threshold number, block a process that is an origin of the incoming DNS traffic, from which the FQDNs were determined, or output an alert message.
A third aspect of the present invention provides a method for detecting malicious domain names, the method comprising receiving, as an input, a FQDN and a public suffix index, determining a public suffix sequence and a domain characters sequence in the FQDN based on the public suffix index, processing the public suffix sequence to obtain a first result indicative of whether the FQDN is malicious or not, processing the domain characters sequence to obtain a second result indicative of whether the FQDN is malicious or not, and merging the first result and the second result and determining based on the merged result whether the FQDN is malicious or not.
In an implementation form of the third aspect, the method comprises processing the public suffix sequence with a LSTM network, and/or processing the domain characters sequence with a second LSTM network.
In a further implementation form of the third aspect, the first LSTM network and/or the second LSTM network is a RNN.
In a further implementation form of the third aspect, if the method determines that the FQDN does not include any public suffix sequence, the method further comprises taking the FQDN and omitting the processing of any sub-domain characters sequence of the FQDN.
In a further implementation form of the third aspect, for processing the public suffix sequence, the method comprises computing a probability that the public suffix sequence and the domain character sequence are used for a malicious FQDN based on determined previous events.
In a further implementation form of the third aspect, the method further comprises computing a probability that the public suffix sequence is used by a DGA.
In a further implementation from the third aspect, the method comprises receiving, as an input, a training set for learning the determined previous events.
In a further implementation form of the third aspect, for processing the domain characters sequence, the method comprises calculating a probability that the domain characters sequence is used for a malicious FQDN based on a likelihood of one or more next characters in the sequence.
In a further implementation form of the third aspect, for determining whether the FQDN is malicious or not, the method comprises classifying the merged result.
The method of the third aspect and its implementation forms achieve the same effects and advantages described with respect to the device of the first aspect and its respective implementation forms.
A fourth aspect of the present invention provides a computer program product comprising program code for controlling a device according to the first aspect or any of its implementation forms, or for performing, when implemented on a processor, a method according to the third aspect or any of its implementation forms.
Accordingly, with the program code, which is e.g. stored on the computer program produce, the above-described advantages and effects of the method of the third aspect and of the device of the first aspect can respectively be achieved. The computer program product may be a data carrier carrying the program code or may be a hardware storage device or the like. It has to be noted that all devices, elements, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean
that the respective entity is adapted to or configured to perform the respective steps and functionalities.
Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof.
BRIEF DESCRIPTION OF DRAWINGS The above described aspects and implementation forms of the present invention will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which
FIG. 1 shows a device according to an embodiment of the present invention. FIG. 2 shows a device according to an embodiment of the present invention. FIG. 3 shows a device according to an embodiment of the present invention. FIG. 4 shows a performance of a device according to an embodiment of the present invention.
FIG. 5 shows a detection rate for several DGAs achieved by a device according to an embodiment of the present invention. FIG. 6 shows a system according to an embodiment of the present invention. FIG. 7 shows an integration of a system according to an embodiment of the present invention into a host intrusion detection system.
FIG. 8 shows a clout botnet detection service including a device according to an embodiment of the present invention. FIG. 9 shows a method according to an embodiment of the present invention.
DETAILED DESCRIPTION OF EMBODIMENTS
FIG. 1 shows a device 100 according to an embodiment of the present invention. The device 100 is especially suited for detecting malicious domain names, particularly generated by DGA, and thus for identifying DGAs. The device 100 may comprise at least one processor and/or at least one LSTM network configured for implementing functions (a detection algorithm) described in the following. Thereby the at least one LSTM network may be implemented by processing circuitry.
The device 100 is configured to receive, as an input, a FQDN 101 and a public suffix index 102. The public suffix index 102 may also be referred to as a public suffix list. Further, the device
100 is configured to determine a public suffix sequence 103 and a domain characters sequence 104 in the FQDN 101 based on the public suffix index 102. In other words, the device 100 can extract from the FQDN 101 the public suffix sequence 103 as a first part, and the domain character sequence 104 as a second part. These parts of the FQDN 101 can then be processed separately by the device 100.
In particular, the device 100 is configured to process the public suffix sequence 103 to obtain a first result 105 indicative of whether the FQDN 101 is malicious or not, and to process the domain characters sequence 104 to obtain a second result 106 indicative of whether the FQDN
101 is malicious or not. To this end the device 100 can comprise at least one LSTM network to carry out the processing. An LSTM network may be, for instance, a RNN or CNN. For obtaining the first result 105, the device 100 may be configured to compute a probability that the public suffix sequence 103 or the public suffix sequence 103 and the domain character sequence 104 are used for a malicious FQDN 101, based on determined previous events like a history recording. For instance, the more often a public suffix sequence 103 was already used for a malicious FQDN 101, the higher the probability that it is again used maliciously. For obtaining the second result 105, the device 100 may be configured to calculate a probability that the domain characters sequence 104 is used for a malicious FQDN 101 based on a likelihood of one or more next characters in the domain characters sequence 104. For instance, the lower the likelihood of the one or more next characters, the higher the probability that the domain characters sequence 104 is used maliciously.
Finally, the device 100 is configured to merge the first result 105 and the second result 106 to obtain a merged result 107. Based on the merged result 107, the device 100 is configured to
determine as an end result, whether the FQDN 101 is malicious or not. When merging the first result 105 and the second result 106, the device 100 may also be configured to weight the results.
FIG. 2 shows a device 100 according to an embodiment of the present invention, which builds on the device 100 shown in FIG. 1. Same elements in FIG. 1 and FIG. 2 are labelled with same reference signs and function likewise. Accordingly, also the device 100 of FIG. 2 is configured to receive the public suffix index 102 and the FQDN 101, respectively, and to determine in a two-step process, whether the FQDN 101 is malicious or not. For the two-step process, the device 100 particularly uses two different paths 202, 203 in a deep learning model, particularly two different LSTM networks.
FIG. 2 shows particularly that the public suffix index 102 and the FQDN 101 are input to a unit 200 of the device 100 configured for top level domain extraction. This extraction unit 200 yields the domain characters sequence number 104 and the public suffix sequence 103, which is here referred to as a public suffix array. The public suffix sequence 103 then takes a first path through e.g. a first LSTM 202, which yields the first result 105. The domain character sequence 104 takes a second path through e.g. a second LSTM 203, which yields the second result 106. These two results 105 and 106 are the merged in a merge layer 204 of the device 100, which yields the merged result 107. Based on the merged result 107, the deep learning model takes the decision, whether the input FQDN 101 is malicious or not. Notably, the LSTMs 202 and 203 and the merge layer 204 are part of the deep learning model.
FIG. 3 shows a device 100 according to an embodiment of the present invention, which builds on the device 100 shown in FIG. 2. Same elements in FIG. 2 and FIG. 3 are labelled with the same reference signs and function likewise. In particular, FIG. 3 shows the deep learning model of the device 100 shown in FIG. 2 in more detail.
The deep learning model consists particularly of the two separate LSTM networks 202 and 203. The first LSTM 203 is for the processing of the domain characters sequence 104 (e.g. kmcokkdoqwvfgk) and the second LSTM 202 is for the processing of the public suffix sequence 103 or array (e.g. the public suffix act.edu.au is represent by the array [,act‘, ,edu‘, ,au‘]).
The respective results 105 and 106 are merged in the merge concatenation layer 204, and are processed by fully connected layers 306 and 308, respectively. A first fully concatenated layer 306 processes the output of the first LSTM 202 to produce the result 105, and a second fully
concatenated layer 308 processes the merged result 107. The output of the device 100 is then predicted, i.e. whether the FQDN 101 is malicious or not.
The deep learning model has, for instance, been trained on 1M Alexa Index of most popular sites, DMOZ index with more than 3M manually edited non-malicious domains and about 1M of DGA samples taken from Open-source intelligence (OSINT) and DGArchive (DGArchive. caad. fkie. fraunho fer.de) .
FIG. 4 compares the performance of a device 100 according to the present invention, which particularly implements the above-described deep learning model, with a device implementing a conventional model (e.g. an algorithm presented by Norwegian Computing Center). It can be seen that the device 100 according to an embodiment of the present invention shows a significantly improved performance with respect to both validation accuracy and validation loss over the conventional device. That is, the validation accuracy of the device 100 is considerably higher than of the conventional device, while its validation loss is considerably lower.
FIG. 5 shows a detection rate achieved by a device 100 according to an embodiment of the present invention for several DGAs. In particular, the table shown in FIG. 5 names the various DGAs, describes them shortly, and demonstrates that a probability of the device 100 on detection of unseen domains produced with the various DGAs is consistently high. Further, a probability for false positives on non-malicious domains from unseen sources is low.
FIG. 6 shows a system 600 according to an embodiment of the present invention. The system 600 is particularly for detecting malicious domain names, especially malicious domain names generated by a DGA. The system 600 comprises a monitoring device 601 configured to monitor incoming DNS traffic 602 and to determine at least one FDQN 101 from the incoming DNS traffic 602. The system 600 also comprises a device 100 according to an embodiment of the present invention, as for example shown in FIG. 1, 2 or 3. The device 100 is configured to determine, whether the determined FQDN 101 is malicious or not. This determination is achieved with the two-step process explained above.
FIG. 7 shows that the system 600 (and thus respectively also the device 100) may be implemented in, or may even be, a host intrusion detection system (HIDS). The HIDS may be a Cloud Service provided to consumers. The HIDS may be composed of several plugins running on a common agent-based platform at the side of a Guest Virtual Machine (VM). The DGA plugin, i.e. the system 600, may run on top of the HIDS Agent platform, and may passively
sniff the DNS traffic 602. Once a new DNS lookup is detected, the FQDN 101 is sent to a Cloud Botnet Detection Service, which includes the device 100. If the Cloud Botnet Detection Service detects a malicious domain, the HIDS waits for certain threshold (e.g. 10 positively detected DGAs) and then blocks (or, alternatively, alerts) the process that is the origin of the DGA traffic. In other words, the system 600 in the HIDS is configured to, after a number of FQDNs 101 has been determined to be malicious, wherein the number is above a determined threshold number, block a process that is an origin of the incoming DNS traffic 602, from which the FQDNs 101 were determined, or output an alert message.
FIG. 8 shows a Cloud Botnet Detection Service, e.g. the one used in the system 600 of FIG. 7, including the device 100. The Cloud Botnet Detection Service may be part of a Galaxy Big Data and AI platform. Galaxy is responsible for data aggregation from multiple sources and its processing (including model building). A DGA Feed Aggregation component is responsible for data aggregation both for proven benign domains (e.g. Alexa, DMOZ, Huawei DNSaaS) and malicious domains (e.g. DGArchive, Malwaredomainlist, OSINT and more). The aggregated data is stored in Big Data platform. A botnet detection service, implemented by or including the device 100 according to an embodiment of the present invention, is responsible for periodic training of the model described above. The trained model is used for inference for domain lists that are coming from HIDS agents.
FIG. 9 shows a method 900 according to an embodiment of the present invention. The method 900 may be carried out by a device 100 according to an embodiment of the present invention, as e.g. shown in FIG. 1, 2 or 3, or a system 600 as shown in FGI. 6 or FIG. 7. The method 900 comprises a step 901 of receiving, as an input, a FQDN 101 and a public suffix index 102. Further, it comprises a step 902 of determining a public suffix sequence 103 and a domain characters sequence 104 in the FQDN 101 based on the public suffix index 103. Further, it comprises a step 902 of processing the public suffix sequence 103 to obtain a first result 105 indicative of whether the FQDN 101 is malicious or not. Further, it comprises a step 904 of processing the domain characters sequence 104 to obtain a second result 106 indicative of whether the FQDN 101 is malicious or not. Finally, it comprises a step 905 of merging the first result 105 and the second result 106 and determining based on the merged result 107 whether the FQDN 101 is malicious or not.
The present invention has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those
persons skilled in the art and practicing the claimed invention, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word “comprising” does not exclude other elements or steps and the indefinite article“a” or“an” does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.
Claims
1. Device (100) for detecting malicious domain names, the device (100) being configured to
receive, as an input, a Fully-Qualified Domain Name, FQDN (101) and a public suffix index (102),
determine a public suffix sequence (103) and a domain characters sequence (104) in the FQDN (101) based on the public suffix index (102),
process the public suffix sequence (103) to obtain a first result (105) indicative of whether the FQDN (101) is malicious or not,
process the domain characters sequence (104) to obtain a second result (106) indicative of whether the FQDN (101) is malicious or not, and
merge the first result (105) and the second result ( 106) and determine based on the merged result (107) whether the FQDN (101) is malicious or not.
2. Device (100) according to claim 1, comprising
a first Long Short-Term Memory, LSTM, network (202) for processing the public suffix sequence (103), and/or
a second LSTM network (203) for processing the domain characters sequence (104).
3. Device (100) according to claim 2, wherein
the first LSTM network (202) and/or the second LSTM network (203) is a Recurrent Neural Network.
4. Device (100) according to one of the claims 1 to 3, wherein for processing the public suffix sequence (103), the device (100) is configured to
compute a probability that the public suffix sequence (103) and the domain character sequence (104) are used for a malicious FQDN (101) based on determined previous events.
5. Device (100) according to claim 4, further configured to
compute a probability that the public suffix sequence (103) is used by a Domain Generation Algorithm.
6. Device (100) according to claim 4 or 5, configured to
receive, as an input, a training set for learning the determined previous events.
7. Device (100) according to one of the claims 1 to 6, wherein for processing the domain characters sequence (104), the device (100) is configured to
calculate a probability that the domain characters sequence (104) is used for a malicious FQDN (101) based on a likelihood of one or more next characters in the domain characters sequence (104).
8. Device (100) according to one of the claims 1 to 7, wherein for determining whether the FQDN (101) is malicious or not, the device (100) is configured to
classify the merged result (107).
9. System (600) for detecting malicious domain names, the system (600) comprising
a monitoring device (601) configured to monitor incoming Domain Name System, DNS, traffic (602) and determine at least one FDQN (101) from the incoming DNS traffic (602), and
a device (100) according to one of the claims 1 to 9 configured to determine whether the determined FQDN (101) is malicious or not.
10. System (600) according to claim 9, configured to,
after a number of FQDNs (101) has been determined to be malicious, wherein the number is above a determined threshold number,
block a process that is an origin of the incoming DNS traffic (602), from which the FQDNs (101) were determined, or
output an alert message.
11. Method (900) for detecting malicious domain names, the method (900) comprising
receiving (901), as an input, a Fully-Qualified Domain Name, FQDN (101) and a public suffix index (102),
determining (902) a public suffix sequence (103) and a domain characters sequence (104) in the FQDN (101) based on the public suffix index (102),
processing (903) the public suffix sequence (103) to obtain a first result (105) indicative of whether the FQDN (101) is malicious or not,
processing (904) the domain characters sequence (104) to obtain a second result (106) indicative of whether the FQDN (101) is malicious or not, and
merging (905) the first result (105) and the second result (106) and determining based on the merged result (107) whether the FQDN (101) is malicious or not.
2. Computer program product comprising program code for controlling a device (100) according to one of the claims 1 to 8, or for performing, when implemented on a processor, a method (900) according to claim 11.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2018/064092 WO2019228613A1 (en) | 2018-05-29 | 2018-05-29 | Device and method for detecting malicious domain names |
CN201880093939.3A CN112204930B (en) | 2018-05-29 | 2018-05-29 | Malicious domain name detection device, system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2018/064092 WO2019228613A1 (en) | 2018-05-29 | 2018-05-29 | Device and method for detecting malicious domain names |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019228613A1 true WO2019228613A1 (en) | 2019-12-05 |
Family
ID=62528421
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2018/064092 WO2019228613A1 (en) | 2018-05-29 | 2018-05-29 | Device and method for detecting malicious domain names |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112204930B (en) |
WO (1) | WO2019228613A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112261063A (en) * | 2020-11-09 | 2021-01-22 | 北京理工大学 | Network malicious traffic detection method combined with deep hierarchical network |
CN114970521A (en) * | 2022-05-18 | 2022-08-30 | 哈尔滨工业大学(威海) | Method for detecting DGA domain name based on domain name information |
CN115102714A (en) * | 2022-05-17 | 2022-09-23 | 中国科学院信息工程研究所 | Method and device for detecting malicious domain name based on dynamic evolution graph |
US11595357B2 (en) | 2019-10-23 | 2023-02-28 | Cisco Technology, Inc. | Identifying DNS tunneling domain names by aggregating features per subdomain |
US20240323223A1 (en) * | 2021-09-29 | 2024-09-26 | Infoblox Inc. | Detecting visual similarity between dns fully qualified domain names |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9043894B1 (en) * | 2014-11-06 | 2015-05-26 | Palantir Technologies Inc. | Malicious software detection in a computing system |
CN105827594A (en) * | 2016-03-08 | 2016-08-03 | 北京航空航天大学 | Suspicion detection method based on domain name readability and domain name analysis behavior |
US20180063168A1 (en) * | 2016-08-31 | 2018-03-01 | Cisco Technology, Inc. | Automatic detection of network threats based on modeling sequential behavior in network traffic |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101350032A (en) * | 2008-09-23 | 2009-01-21 | 胡辉 | Method for judging whether web page content is identical or not |
US9419986B2 (en) * | 2014-03-26 | 2016-08-16 | Symantec Corporation | System to identify machines infected by malware applying linguistic analysis to network requests from endpoints |
CN107547488B (en) * | 2016-06-29 | 2020-12-15 | 华为技术有限公司 | A kind of DNS tunnel detection method and DNS tunnel detection device |
-
2018
- 2018-05-29 CN CN201880093939.3A patent/CN112204930B/en active Active
- 2018-05-29 WO PCT/EP2018/064092 patent/WO2019228613A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9043894B1 (en) * | 2014-11-06 | 2015-05-26 | Palantir Technologies Inc. | Malicious software detection in a computing system |
CN105827594A (en) * | 2016-03-08 | 2016-08-03 | 北京航空航天大学 | Suspicion detection method based on domain name readability and domain name analysis behavior |
US20180063168A1 (en) * | 2016-08-31 | 2018-03-01 | Cisco Technology, Inc. | Automatic detection of network threats based on modeling sequential behavior in network traffic |
Non-Patent Citations (1)
Title |
---|
CHIBA DAIKI ET AL: "DomainChroma: Building actionable threat intelligence from malicious domain names", COMPUTERS & SECURITY, ELSEVIER SCIENCE PUBLISHERS. AMSTERDAM, NL, vol. 77, 6 April 2018 (2018-04-06), pages 138 - 161, XP085485739, ISSN: 0167-4048, DOI: 10.1016/J.COSE.2018.03.013 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11595357B2 (en) | 2019-10-23 | 2023-02-28 | Cisco Technology, Inc. | Identifying DNS tunneling domain names by aggregating features per subdomain |
CN112261063A (en) * | 2020-11-09 | 2021-01-22 | 北京理工大学 | Network malicious traffic detection method combined with deep hierarchical network |
US20240323223A1 (en) * | 2021-09-29 | 2024-09-26 | Infoblox Inc. | Detecting visual similarity between dns fully qualified domain names |
CN115102714A (en) * | 2022-05-17 | 2022-09-23 | 中国科学院信息工程研究所 | Method and device for detecting malicious domain name based on dynamic evolution graph |
CN114970521A (en) * | 2022-05-18 | 2022-08-30 | 哈尔滨工业大学(威海) | Method for detecting DGA domain name based on domain name information |
Also Published As
Publication number | Publication date |
---|---|
CN112204930B (en) | 2022-03-01 |
CN112204930A (en) | 2021-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11558418B2 (en) | System for query injection detection using abstract syntax trees | |
Vinayakumar et al. | Scalable framework for cyber threat situational awareness based on domain name systems data analysis | |
Kumar et al. | Machine learning-based early detection of IoT botnets using network-edge traffic | |
US10270744B2 (en) | Behavior analysis based DNS tunneling detection and classification framework for network security | |
US20250094589A1 (en) | Detecting microsoft windows installer malware using text classification models | |
Yin et al. | ConnSpoiler: Disrupting C&C communication of IoT-based botnet through fast detection of anomalous domain queries | |
Marchal et al. | PhishStorm: Detecting phishing with streaming analytics | |
CN112204930B (en) | Malicious domain name detection device, system and method | |
Zhao et al. | Malicious Domain Names Detection Algorithm Based on N‐Gram | |
Najafimehr et al. | DDoS attacks and machine‐learning‐based detection methods: A survey and taxonomy | |
Marchal et al. | Proactive discovery of phishing related domain names | |
Chu et al. | Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing URLs | |
US11108794B2 (en) | Indicating malware generated domain names using n-grams | |
EP3948587B1 (en) | Graph stream mining pipeline for efficient subgraph detection | |
Mishra et al. | Out-VM monitoring for malicious network packet detection in cloud | |
Abraham et al. | Approximate string matching algorithm for phishing detection | |
Kumar et al. | Enhanced domain generating algorithm detection based on deep neural networks | |
Stiawan et al. | IoT botnet attack detection using deep autoencoder and artificial neural networks. | |
Qin et al. | An exploit kits detection approach based on HTTP message graph | |
Kumar et al. | Detection of phishing websites using an efficient machine learning framework | |
Qi et al. | Botcensor: Detecting dga-based botnet using two-stage anomaly detection | |
Thein et al. | Malicious Domain Detection Based on Decision Tree | |
Najafi et al. | Guilt-by-association: detecting malicious entities via graph mining | |
Spaulding et al. | Thriving on chaos: Proactive detection of command and control domains in internet of things‐scale botnets using DRIFT | |
Nguyen Quoc et al. | Detecting DGA botnet based on malware behavior analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18729076 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18729076 Country of ref document: EP Kind code of ref document: A1 |