CN119675898B

CN119675898B - Probabilistic prediction model training method, non-target domain name detection method and device

Info

Publication number: CN119675898B
Application number: CN202411597969.1A
Authority: CN
Inventors: 徐国胜; 高宇昊; 于城; 胡梓祺; 黄王辰鹭; 徐文博; 赖秋楠; 高子淑; 杜建琪; 王智明; 罗蒙
Original assignee: Beijing University of Posts and Telecommunications; China United Network Communications Group Co Ltd; Unicom Digital Technology Co Ltd
Current assignee: Beijing University of Posts and Telecommunications; China United Network Communications Group Co Ltd; Unicom Digital Technology Co Ltd
Priority date: 2024-11-08
Filing date: 2024-11-08
Publication date: 2025-10-14
Anticipated expiration: 2044-11-08
Also published as: CN119675898A

Abstract

The application provides a training method of a probability prediction model, a detection method and a detection device of a non-target domain name, and relates to the technical field of information safety. The training method of the probability prediction model comprises the steps of obtaining domain name data, preprocessing the domain name data to obtain integer arrays with preset length, carrying out word embedding on the arrays to obtain first domain name data, obtaining local key feature images of the first domain name data through a convolutional neural network, obtaining context features of the local key feature images through a gating circulation unit, mapping feature spaces of the local key feature images and feature spaces of the context features to a one-dimensional sample marking space through the convolutional neural network, obtaining prediction probability of the first domain name data according to the sample marking space, and obtaining probability that the first domain name data is a non-target domain name through the prediction probability. By the method, the problem of high training and calculating difficulty caused by a machine learning model integrating the convolutional neural network and the long-short-term memory network can be solved.

Description

Training method of probability prediction model, and detection method and device of non-target domain name

Technical Field

The application relates to the technical field of information safety, in particular to a training method of a probability prediction model, and a detection method and device of a non-target domain name.

Background

The Domain name system (Domain NAME SYSTEM, DNS), a distributed database on the internet that maps Domain names and network layer protocol (Internet Protocol, IP) addresses to each other, allows users to more conveniently access the internet without having to remember a network layer protocol string that can be read directly by the machine. DNS services have been deep into various corners of the internet, becoming an indispensable key ring on the internet, and are currently the most critical infrastructure on the internet. As a core service of the internet, the security problem of DNS is critical, because DNS suffers from an attack or security defect, which causes immeasurable loss to the whole network. In addition to DNS security, non-benign software is also increasingly occurring. Domain name generation algorithm (Domain Generation Algorithm, DGA), an important technique for evading detection in non-benign software, generates a large number of pseudo-random domain names for contact with command and control servers. Because the use of fixed domain or network layer protocol addresses is easily detected and blocked, domain name generation algorithms are applied in many non-benign software, switching directly to new domain names, thereby evading supervision without requiring regular release of new versions or redeployment.

In the prior art, DGA detection is performed by using a machine learning model which fuses a convolutional neural network (Convolutional Neural Networks, CNN) and a Long-term Memory network (LSTM), and the local features of the domain name are extracted and the context features of the domain name are extracted. However, the parameters of the long-short-term memory network are more, so that the training and calculation of the machine learning model are more difficult.

Disclosure of Invention

The application provides a training method of a probability prediction model, a non-target domain name detection method and a non-target domain name detection device, which are used for solving the problem of high training and calculating difficulty caused by a machine learning model integrating a convolutional neural network and a long-short-term memory network.

An embodiment of the present invention provides a training method for a probability prediction model, including:

Acquiring domain name data in domain name system traffic, and preprocessing the domain name data to obtain integer arrays with preset lengths;

Word embedding is carried out on the integer array so as to obtain first domain name data, wherein the first domain name data is a low-dimension dense matrix;

performing feature extraction and dimension reduction on the first domain name data through a convolutional neural network to obtain a local key feature map of the first domain name data;

obtaining the context characteristics of the local key characteristic map through a gating circulating unit;

When the task is a classification task, mapping the feature space of the local key feature map and the feature space of the context feature to a one-dimensional sample mark space through the convolutional neural network, wherein the task is a classification task and is used for indicating whether the first domain name data is a target domain name or a non-target domain name;

And obtaining the prediction probability of the first domain name data according to the one-dimensional sample mark space so as to obtain the probability that the first domain name data is a non-target domain name through the prediction probability.

In one possible design, the feature extraction and dimension reduction are performed on the first domain name data through a convolutional neural network to obtain a local key feature map of the first domain name data, where the local key feature map includes:

Performing convolution operation on the first domain name data through a convolution layer of the convolution neural network to obtain a feature map corresponding to the first domain name data;

and performing downsampling operation on the feature map through a pooling layer of the convolutional neural network to obtain a local key feature map of the first domain name data.

In one possible design, the feature map is a matrix;

The step of performing downsampling operation on the feature map through the pooling layer of the convolutional neural network to obtain a local key feature map of the first domain name data includes:

moving a pooling matrix with a preset size to each position of the feature map one by one to obtain a coverage area corresponding to each position, wherein the size of each coverage area is the same as that of the pooling matrix;

And obtaining the maximum value in a plurality of numerical values of each coverage area, and obtaining the local key feature map according to the maximum value in each coverage area.

In one possible design, the domain name data includes a number of uppercase letters and a number of lowercase letters;

The preprocessing of the domain name data to obtain integer arrays with preset lengths comprises the following steps:

deleting repeated data and irrelevant data in the domain name data to obtain second domain name data;

Filling the second domain name data through filling characters to obtain third domain name data, wherein the length of the third domain name data is a preset length;

converting each capital letter in the third domain name data into a corresponding lowercase letter to obtain fourth domain name data;

and performing dictionary mapping on the fourth domain name data to obtain the integer array with the preset length.

In one possible design, the mapping, by the convolutional neural network, the feature space of the local key feature map and the feature space of the contextual feature to a one-dimensional sample label space includes:

and mapping the feature space of the local key feature map and the feature space of the context feature to a one-dimensional sample marking space through a full connection layer of the convolutional neural network.

A second aspect of an embodiment of the present invention provides a method for detecting a non-target domain name, including:

When the first domain name data is neither in a preset blacklist nor in a preset whitelist, inputting the first domain name data into a preset probability prediction model to obtain the prediction probability of the first domain name data so as to determine whether the first domain name data is a target domain name or a non-target domain name according to the prediction probability, wherein the probability prediction model is a model obtained by the training method of the probability prediction model provided by the first aspect of the embodiment of the invention.

In one possible design, the determining, according to the prediction probability, whether the first domain name data is a target domain name or a non-target domain name includes:

And if the prediction probability is larger than a second preset value, determining that the first domain name data is a non-target domain name, wherein the first preset value is smaller than the second preset value.

In one possible design, if the prediction probability is not less than the first preset value and not greater than the second preset value, the method further includes:

Acquiring an entropy value, the number of consonant letters and the domain name length of the first domain name data;

if the entropy value is larger than a third preset value, the number of consonant letters is larger than a fourth preset value and the domain name length is larger than a fifth preset value, determining that the first domain name data is a target domain name, otherwise, determining that the first domain name data is a non-target domain name.

A third aspect of an embodiment of the present invention provides a training device for a probabilistic predictive model, including:

The preprocessing module is used for acquiring domain name data in domain name system traffic and preprocessing the domain name data to obtain integer arrays with preset lengths;

The word embedding module is used for word embedding the integer array so as to obtain first domain name data, wherein the first domain name data is a low-dimension dense matrix;

The local feature extraction module is used for carrying out feature extraction and dimension reduction on the first domain name data through a convolutional neural network to obtain a local key feature map of the first domain name data;

the context feature extraction module is used for obtaining the context features of the local key feature map through the gating circulation unit;

The mapping module is used for mapping the feature space of the local key feature map and the feature space of the context feature to a one-dimensional sample marking space through the convolutional neural network when the task is a classification task, wherein the task is a classification task and is used for indicating whether the first domain name data is a target domain name or a non-target domain name;

And the output module is used for obtaining the prediction probability of the first domain name data according to the one-dimensional sample mark space so as to obtain the probability that the first domain name data is a non-target domain name through the prediction probability.

A fourth aspect of an embodiment of the present invention provides a device for detecting a non-target domain name, including:

The probability prediction module is used for inputting the first domain name data into a preset probability prediction model when the first domain name data is neither in a preset blacklist nor in a preset whitelist, so as to obtain the prediction probability of the first domain name data, and determining whether the first domain name data is a target domain name or a non-target domain name according to the prediction probability, wherein the probability prediction model is a model obtained by the training device for providing the probability prediction model according to the third aspect of the embodiment of the invention.

A fifth aspect of the embodiments of the present invention provides an electronic device, comprising a processor and a memory communicatively coupled to the processor;

The memory stores computer-executable instructions;

The processor executes the computer-executed instructions stored in the memory to implement the training method of the probability prediction model provided in the first aspect of the embodiment of the present invention or the method for detecting the non-target domain name provided in the second aspect of the embodiment of the present invention.

A sixth aspect of the embodiment of the present invention provides a computer readable storage medium, where computer executable instructions are stored in the computer readable storage medium to implement the training method of the probabilistic predictive model provided in the first aspect of the embodiment of the present invention or the method for detecting a non-target domain name provided in the second aspect of the embodiment of the present invention.

A seventh aspect of the embodiments of the present invention provides a computer program product, which includes a computer program, where the computer program when executed by a processor implements a training method of a probabilistic predictive model provided in the first aspect of the embodiments of the present invention or a method for detecting a non-target domain name provided in the second aspect of the embodiments of the present invention.

The application provides a training method of a probability prediction model, a non-target domain name detection method and a device, wherein the training method of the probability prediction model comprises the steps of obtaining domain name data in domain name system flow, preprocessing the domain name data to obtain integer arrays with preset lengths, performing word embedding on the integer arrays to obtain first domain name data, wherein the first domain name data is a low-dimension dense matrix, performing feature extraction and dimension reduction on the first domain name data through a convolutional neural network to obtain a local key feature map of the first domain name data, obtaining context features of the local key feature map through a gating circulation unit, mapping feature spaces of the local key feature map and feature spaces of the context features to a one-dimensional sample mark space through the convolutional neural network when tasks are two-class tasks, wherein the tasks are two-class tasks and are used for indicating whether the first domain name data is a target domain name or a non-target domain name, and obtaining the prediction probability of the first domain name data according to the one-dimensional sample mark space so as to obtain the probability that the first domain name data is the non-target domain name through the prediction probability. Based on the method, the technical effects that local characteristic information in domain name data is extracted by using a convolutional neural network, context information in domain name data is extracted by using a gating circulation unit, more accurate output can be obtained based on the local characteristic information and the context information, input discrete domain name data can be converted into a low-dimensional dense matrix by word embedding, so that the neural network can better understand the input data in a subsequent task, and the context characteristic extraction is performed by the gating circulation unit, so that training difficulty of a model can be reduced, and calculation is easier.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a training method of a probability prediction model according to an embodiment of the present application;

FIG. 2 is a second flow chart of a training method of a probability prediction model according to an embodiment of the present application;

Fig. 3 is a flowchart illustrating a method for detecting a non-target domain name according to an embodiment of the present application;

Fig. 4 is a second flow chart of a method for detecting a non-target domain name according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a training device for a probabilistic predictive model according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a non-target domain name detection device according to an embodiment of the present application;

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the embodiments of the present application, the words "first", "second", etc. are used to distinguish between the same item or similar items that have substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ. It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion. In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more.

The "at the time of the" of the embodiment of the present application may be instantaneous when a certain situation occurs, or may be within a period of time after a certain situation occurs, which is not particularly limited. In addition, the model training method provided by the embodiment of the application is only used as an example, and the model training method can also comprise more or less contents.

In order to facilitate the clear description of the technical solutions of the embodiments of the present application, the following simply describes some terms and techniques involved in the embodiments of the present application:

The convolutional neural network is one of the most successful algorithms in the field of deep learning, and is widely applied to large-scale image processing tasks. In addition to image processing, convolutional neural networks may also be applied to text classification. The convolutional neural network comprises a one-dimensional, two-dimensional and three-dimensional neural network, wherein the one-dimensional convolutional neural network is mainly used for tasks such as sequence data processing, the two-dimensional convolutional neural network is mainly used for tasks such as text recognition, and the three-dimensional neural network is mainly used for tasks such as medical image and video data processing. The convolution layer and the pooling layer can repeatedly appear in the hidden layer, wherein the convolution layer is used for extracting features of input data, specifically comprises a plurality of convolution kernels similar to neurons of a feedforward neural network, each element of the convolution kernels corresponds to a weight coefficient and a deviation amount, each neuron in the convolution layer is connected with neurons of a previous layer, in a working state, the convolution layer can periodically scan the input features to extract the features of the input data, the pooling layer is used for selecting the features processed by the convolution layer and carrying out dimension reduction and filtering on the features, the fully-connected layer can carry out nonlinear integration on the features extracted by the previous layers and then transmit the integrated result to a final output layer, and the output layer is used for receiving the data of the hidden layer and giving a final classification result of a model and restraining the output result through an activation function.

And the gating circulation unit is one kind of circulating neural network. As well as long and short term memory networks, have been proposed to solve the problems of long term memory and gradients in counter-propagation. Three gates are introduced into the long-short-period memory network to control the network, wherein the input gate control input value is responsible for selectively updating the memory unit, the gate control memory value is forgotten to determine which information is discarded from the unit state, and the gate control output value is output to determine the output hidden state. Compared with a long-period memory network with a complex structure, the gating circulation unit is provided with only two gates, and in order to simplify the internal structure and improve the operation efficiency, the gating circulation unit integrates the structures of an input gate, a forgetting gate and an output gate in the long-period memory network into an update gate and a reset gate. Wherein the reset gate determines how to combine the new input information with the previous memory and the update gate defines the amount of previous memory saved to the current time step.

Domain name system security extension:

1) Domain name system security extensions (Domain NAME SYSTEM Security Extensions, DNSSEC) are cryptographic signatures added to DNS records that help to ensure security when data is transmitted over an internet protocol (Internet Protocol, IP) network. DNSSEC arises because the DNS initial architecture does not contain any security measures at the protocol level, so an attacker may try to direct a user to a fraudulent website. To this end, DNSSEC portals have been introduced in the industry to add a layer of authenticity and integrity protection to the DNS response.

2) DNSSEC is a secure expansion based on DNS protocol, and the main idea is to provide source authentication and data integrity guarantee for DNS analysis flow by adding encryption signature in DNS record. In particular implementation, DNSSEC uses public key infrastructurePKI) to verify the integrity and authenticity of the data. The owner or administrator of the domain name generates a pair of public and private key pairs, the public keys are stored in the DNS record, and the private keys are kept. In response to the DNS request, the domain name server calculates a digest of the response content and encrypts it using the private key to generate a signature. After receiving the response content and the signature, the inquirer only needs to decrypt the digest value by using the public key and then compares the calculated digest of the corresponding content. This process ensures the integrity of the DNS request and the identity of the data source.

3) DNSSEC provides a security enhancement for domain name system, providing verification mechanism for domain name source identity, integrity and whether tampered in transmission process, but not guaranteeing DNS data confidentiality and availability. Meanwhile, although DNSSEC provides trusted and secure DNS communication, DNSSEC is not fully supported by the whole Domain name system, mainly because the Domain name system is too huge, from the root to the Top-Level Domain (TLD) to the subzone name, all nodes cannot be guaranteed to support DNSSEC, so that DNSSEC implementation is compatible with the existing Domain name system, and DNSSEC request and verification can be automatically skipped when DNS does not support DNSSEC.

The DOT protocol is based on the transport layer security protocolTLS), encrypts DNS queries and ensures its standards of privacy and security. The TLS protocol is a widely used secure encryption protocol that itself has achieved confidentiality and integrity as compared to the traditional DNS protocol. DoT has confidentiality compared to DNSSEC protocol. In addition, compared with DNS encryption tools such as DNSCrypt, the DoT has formed request comments @, the likeRFC) standardized documents. However, the application of DoT requires support from the client, while the dominant browser mostly does not support DNS over TLS.

DoH protocol (DNS over HTTPS, doH) is to encapsulate DNS query in secure hypertext transferHTTPS) protocol, by using encrypted HTTPS connections, the purpose of this is to increase user privacy and security while also helping to prevent DNS hijacking and snooping. Unlike DoT, it is HTTPS that encrypts DNS queries instead of TLS, meaning that DNS queries will be encapsulated in HTTPS requests, using the encryption mechanism of HTTPS to protect the data transmission process. By combining DNS queries with HTTPS communications, doH can leverage existing Web infrastructure making it easier to deploy and use in a variety of network environments. In practical application, the DoH client uses HTTPS GET or POST method to encode a single DNS query into an HTTPS request, when HTTPS method is GET, request parameter "DNS" is the Content of DNS request, and base64 is used to encode, when HTTPS method is POST, DNS query is contained in message body of HTTPS request, request header Content-Type indicates request Type.

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

For a clear understanding of the technical solutions of the present application, the prior art solutions will be described in detail first.

In the prior art, DGA detection is performed by using a machine learning model fused with a convolutional neural network and a long-short-term memory network, and the local features of domain name data are extracted and the context features of the domain name data are extracted. However, the parameters of the long-short-term memory network are more, so that the training and calculation of the machine learning model are more difficult.

Therefore, aiming at the problem of high training and calculating difficulty of the machine learning model caused by the DGA detection by utilizing the machine learning model fusing the convolutional neural network and the long-short-time memory network, the method comprises the steps of inputting and preprocessing ① domain name data in order to solve the problem, extracting a local key feature graph of the domain name data through the convolutional neural network ②, extracting context features of the local key feature through a gating circulation unit ③, and outputting a result according to the local key feature graph and the context features of the local key feature ④.

Specifically, domain name data in domain name system traffic is obtained, and the domain name data is preprocessed to obtain integer arrays with preset lengths;

Obtaining the context characteristics of the local key characteristic diagram through a gating circulating unit;

When the task is a classification task, mapping a feature space of the local key feature map and a feature space of the context feature to a one-dimensional sample mark space through a convolutional neural network, wherein the task is the classification task and is used for indicating whether the first domain name data is a target domain name or a non-target domain name;

Based on the principle of the working condition preset condition, the training method of the probability prediction model combines the convolutional neural network with the gating circulation unit, extracts the local key feature map by using the convolutional neural network, extracts the context feature by using the gating circulation unit, and obtains the final result output according to the local key feature map and the context feature. And compared with a long-short-time memory network, the machine learning model integrating the convolutional neural network and the gate control circulation unit has fewer parameters, so that the problem of high training and calculating difficulty caused by the machine learning model integrating the convolutional neural network and the long-short-time memory network is avoided.

Based on the creative findings, the technical scheme of the application is provided.

The application scenario of the training method of the probability prediction model provided by the embodiment of the invention is described below.

The training method of the probability prediction model is suitable for various scenes, can be used for protecting an enterprise internal network from non-benign websites and phishing attacks in enterprise network security, preventing sensitive data from being revealed, can prevent access to non-benign or uncomfortable contents by intercepting and analyzing DNS requests in network traffic filtering, improves network security and compliance, can prevent non-benign software from being infected by the non-benign software, can prevent the non-benign software from communicating or downloading through the DNS requests, reduces infection, can ensure privacy security of users when accessing the Internet in data protection and privacy, can prevent data from being stolen or tracked, and can analyze and monitor network traffic in network traffic monitoring, and can identify abnormal activities and potential security threats.

Embodiments of the present application will now be described with reference to the accompanying drawings.

Fig. 1 is a flowchart of a training method of a probabilistic predictive model according to an embodiment of the present application, where the model training method includes the following steps:

S101, acquiring domain name data in domain name system traffic, and preprocessing the domain name data to obtain integer arrays with preset lengths.

In this embodiment, the input layer of the model is used to obtain domain name data. After domain name data is acquired, the first step of machine learning modeling is data preprocessing, which is an important step for improving data quality and adaptability. The domain name data is converted into integer arrays with preset length, so that the input processing and training process of the model can be simplified, and meanwhile, the data loading and processing speed can be increased by the input with fixed length.

S102, word embedding is carried out on the integer array so as to obtain first domain name data.

In this embodiment, the first domain name data is a low-dimensional dense matrix. The word embedding process is accomplished through the embedding layer of the model. The integer array with the preset length obtained after the pretreatment is converted into the integer array with the preset length, so that the neural network can be operated conveniently.

The principle of the embedding layer is that each integer index is mapped into a dense and continuous vector space through learning, and the mapping enables similar words to be closer in the vector space, so that semantic relations among the words can be better captured. During training of the model, these embedded vectors are optimized along with the network weights so that the model can better understand the input data in subsequent tasks.

In this model, the length of the input sequence of the embedded layer is 75, and the input vocabulary size is 41. Each vocabulary is embedded into a 128-dimensional vector space through an embedding layer function provided by a keras network library in the tensorflow platform, and the obtained character embedded vector sequence is output to a convolutional neural network for feature extraction.

And S103, performing feature extraction and dimension reduction on the first domain name data through a convolutional neural network to obtain a local key feature map of the first domain name data.

In this embodiment, the convolutional neural network is used to extract local key features in domain name data.

S104, obtaining the context characteristics of the local key characteristic diagram through a gating circulation unit.

In this embodiment, the gating loop unit is a type of loop neural network, and can be used to extract the context characteristics of the data, so that the gating loop unit has good performance in processing and predicting the sequence data. The gating circulation unit is similar to the long-term memory network, and can achieve the same effect as the long-term memory network in solving the problem of long-distance dependence of the circulation neural network. The update gate of the gate control loop unit can be regarded as the result of the combination of the input gate and the forget gate of the long-short-term memory network, the update gate controls the control degree introduced at the previous moment, and the reset gate controls the degree to which the state is replaced to the candidate set at the previous moment.

After the convolutional neural network is used for carrying out feature extraction and dimension reduction on the vector sequence, a gating circulation unit with the number of 128 neurons is used for carrying out more deep contextual feature extraction on the data. And finally, outputting the result obtained by the gating circulating unit layer to an output layer.

And S105, when the task is a classification task, mapping the feature space of the local key feature map and the feature space of the context feature to a one-dimensional sample marking space through a convolutional neural network.

In this embodiment, this is achieved by a fully connected layer of convolutional neural network. The full connection layer is generally positioned at the end of the whole convolutional neural network and is responsible for expanding a feature map (matrix) obtained by the convolutional of the last layer into a one-dimensional vector and providing input for a classifier, so that an end-to-end learning process is realized. Each node of the fully connected layer is connected to all nodes of the upper layer and is thus called a fully connected layer. The parameters of the fully connected layer are also generally the most due to their fully connected nature.

The function of the fully connected layer is to integrate the local key feature map and the contextual features into one global representation and to make the final classification or regression. The method helps the model to capture higher abstract features by flattening the feature map and performing linear transformation, so as to realize final decision.

Specifically, the full-connection layer integrates the feature space of the local key feature map and the feature space of the context feature to obtain an integrated feature vector, maps the feature vector to a target task mark space, and outputs the target task mark space to the output layer for final classification. The tasks are classified tasks and are used for indicating whether the first domain name data is a target domain name or a non-target domain name. The target domain name here is a DGA domain name, and the non-target domain name is a non-DGA domain name, i.e. a benign domain name.

In the algorithm designed by the application, in the fully connected layer, using tanh as an activation function, mapping the feature space calculated by each layer to a target task mark space, and outputting the feature space to an output layer for final classification.

The convolution layer can reduce the influence of the feature position on the classification result and improve the robustness of the whole network while mapping the feature space calculated by the front layer into the sample marking space.

S106, according to the one-dimensional sample mark space, obtaining the prediction probability of the first domain name data, so that the probability that the first domain name data is a non-target domain name can be obtained through the prediction probability.

In this embodiment, the output layer is the last layer in the neural network model, and is responsible for generating the final output of the model. It maps the feature representation of the previous hierarchy into the final prediction result or decision. The form of the output layer may vary in different tasks and network architectures.

In the classification task, the output layer usually has only one node, the activation function is sigmoid, and the output is a prediction probability between 0 and 1, and the prediction probability is used for representing the probability that the first domain name data is a non-target domain name.

Fig. 2 is a second flowchart of a training method of a probabilistic predictive model according to an embodiment of the present application. As shown in fig. 2, the training method of the rate prediction model provided in this embodiment is further refined based on the training method of the rate prediction model provided in the previous embodiment of the present application. The training method of the probability prediction model provided in this embodiment includes the following steps:

S201, deleting repeated data and irrelevant data in the domain name data after obtaining the domain name data in the domain name system flow, and obtaining second domain name data.

In this embodiment, the second domain name data is domain name data after the duplicate data and the irrelevant data are deleted. The S201 is used for cleaning the obtained domain name data so as to avoid the influence of repeated data and irrelevant data on an output result.

S202, filling the second domain name data through filling characters to obtain third domain name data.

In this embodiment, the length of the third domain name data is a preset length. And filling characters of domain name data after deleting the repeated data and the irrelevant data to reach a preset length. The domain name data with preset length can simplify the input processing and training process of the model, and can accelerate the data loading and processing speed. Specifically, in this embodiment, the domain name data is a character string, and the preset length is 75, that is, the third domain name data is a character string with a length of 75.

S203, converting each capital letter in the third domain name data into a corresponding lowercase letter to obtain fourth domain name data.

In this embodiment, the domain name data is a character string, and includes a plurality of uppercase letters and a plurality of lowercase letters. All capital letters in the character string are converted into lowercase letters, and fourth domain name data is obtained, namely the fourth domain name data is the character string with the length of 75 and the lowercase letters.

S204, performing dictionary mapping on the fourth domain name data to obtain integer arrays with preset lengths.

In this embodiment, the character string is converted into an integer array, because the integer array is generally easier to process by a computer than the character string data. Therefore, the processing mode can simplify the input of the model, and further improve the calculation efficiency.

S205, word embedding is carried out on the integer array so as to obtain first domain name data.

In this embodiment, the first domain name data is a low-dimensional dense matrix. The operation and manner of S205 are similar to those of 102 in the previous embodiment of the present invention, and will not be described here again.

S206, performing convolution operation on the first domain name data through a convolution layer of the convolution neural network to obtain a feature map corresponding to the first domain name data.

In this embodiment, the convolutional layer is a very important hierarchical structure in the convolutional neural network, and the basic idea is to extract local features of input data through a convolutional operation, and use these features for further processing and analysis. Convolution operations typically scan the input data using a filter or convolution kernel and generate a corresponding signature.

S207, moving a pooling matrix with a preset size to each position of the feature map one by one, obtaining a maximum value of a plurality of values of each coverage area after obtaining the coverage area corresponding to each position, and obtaining a local key feature map according to the maximum value of each coverage area.

In this embodiment, the size of each coverage area is the same as the size of the pooling matrix, and this is done by the pooling layer in the convolutional neural network.

The pooling layer is another important layer in the convolutional neural network, and is mainly used for performing downsampling operation on an input feature map, reducing the dimension of the feature map, reducing the number of parameters and preventing overfitting. Common pooling approaches include maximum pooling and average pooling, which respectively select the maximum or average value in each portlet on the feature map as the output of that portlet, thereby yielding a new pooled feature. The input feature map is downsampled to half the original size. Pooling typically reduces the size of the feature map, thereby further reducing the computational effort. The pooling mode adopted by the application is the maximum pooling.

In the algorithm designed herein, the convolutional neural network consists of a convolutional layer with a convolutional kernel size of 3 and a filter size of 128 and a max-pooling layer with a max-pooling window size of 2. The convolutional neural network performs feature extraction and dimension reduction on the low-dimension dense matrix processed by the embedding layer, and outputs the obtained feature map to the gating circulation unit layer.

S208, obtaining the context characteristics of the local key characteristic diagram through a gating circulating unit.

In this embodiment, the effect and the manner of the gating and circulating unit in S208 are similar to those of the gating and circulating unit in S104 in the previous embodiment of the present invention, and are not described herein again.

S209, when the task is a classification task, mapping the feature space of the local key feature map and the feature space of the context feature to a one-dimensional sample marking space through a convolutional neural network.

In this embodiment, the action and the manner of S209 are similar to those of S105 in the previous embodiment of the present invention, and will not be described here again.

S210, according to the one-dimensional sample mark space, obtaining the prediction probability of the first domain name data, so that the probability that the first domain name data is a non-target domain name can be obtained through the prediction probability.

In this embodiment, the operation and the manner of S210 are similar to those of S106 in the previous embodiment of the present invention, and will not be described herein.

In this embodiment, the data related to the system is recorded, and the information such as the key content and the black-white list of the non-benign DNS request is mainly recorded, which is responsible for the reading, writing and management of the database. Specifically, the related information is stored through the SQLite database. SQLite is a powerful lightweight embedded relational database management system that does not require excessive configuration, installation, and management compared to other databases. The database mainly comprises three tables of non-benign DNS logs, black lists and white lists. Wherein, the dns_log table field format is shown in table 1.

TABLE 1

The write_list field format is shown in table 2.

TABLE 2

The black_list table field format is shown in table 3.

TABLE 3 Table 3

In the aspect of DGA algorithm detection, a comparison experiment between a deep learning model based on CNN-GRU and a machine learning model based on manually extracted features is carried out, and experimental results show that the DGA detection technology combining CNN and GRU designed and realized by the application is obviously superior to other algorithms in comparison experiments in indexes such as accuracy, precision, recall rate, F1 value and the like, and has better performance.

From the aspect of model design, the deep learning model combining CNN and GRU can automatically learn the effective characteristics of input data without manually extracting the characteristics, so that automatic characteristic extraction is performed, and a plurality of hidden layers of the deep learning model can learn more abstract and higher-level characteristics, so that the problem of more complexity is solved. The GRU layer in the algorithm has good performance in the problems of processing and predicting sequence data, and can better extract the context characteristics of domain name data, so that the detection performance of the model is improved.

In terms of system design and implementation, the application constructs a safe DNS gateway system, which can detect and intercept potential threats. The secure DNS gateway supports the use of machine learning algorithms to detect DNS response messages, detecting whether a device is subject to DGA domain name attacks. If the DGA domain name is detected, the system records and intercepts the relevant threat log of the detected DGA domain name according to the configuration of the defense rules. In the privacy protection layer, the DNS security gateway can proxy the DNS request and send a secure DNS query to the upstream DNS server to enhance the security and privacy of the DNS request. In summary, the design and implementation of the machine learning-based secure DNS gateway has important research significance for improving network security, protecting user privacy, coping with non-benign software threats, improving network management efficiency, and promoting the application of machine learning in network security.

The training method of the probability prediction model and the non-target domain name detection method and device have the advantages that local characteristic information in domain name data is extracted through a convolutional neural network, context information in domain name data is extracted through a gating circulation unit, more accurate output can be obtained based on the local characteristic information and the context information, input discrete domain name data can be converted into a low-dimensional dense matrix through word embedding, so that the neural network can better understand input data in a subsequent task, the contextual characteristic extraction can be conducted through the gating circulation unit, training difficulty of the model can be reduced, calculation is easier, prediction probability of domain name data being DGA domain name is obtained, potential DGA interception is conducted according to the prediction probability, and the training and the use of the model can automatically learn effective characteristics of the input data, so that the problem of manually extracting characteristics is avoided.

Fig. 3 is a flowchart of a non-target domain name detection method according to an embodiment of the present application, as shown in fig. 3, where the non-target domain name detection method according to the embodiment includes the following steps:

S301, acquiring domain name data in domain name system traffic, and preprocessing the domain name data to obtain integer arrays with preset lengths.

In this embodiment, the action and the manner of S301 are similar to those of S101 in the first embodiment of the present invention, and are not described here again.

S302, word embedding is conducted on the integer array so as to obtain first domain name data.

In this embodiment, the first domain name data is a low-dimensional dense matrix. The operation and the manner of S302 are similar to those of S102 in the first embodiment of the present invention, and are not described herein.

S303, when the first domain name data is neither in a preset blacklist nor in a preset whitelist, inputting the first domain name data into a preset probability prediction model to obtain the prediction probability of the first domain name data, so as to determine whether the first domain name data is a target domain name or a non-target domain name according to the prediction probability.

In this embodiment, black-and-white list detection is the first domain name detection. When the domain name system security gateway system is initialized, the blacklist and the whitelist in the database are converted into regular expressions and loaded into the memory. When the first domain name data is detected, firstly judging whether the first domain name data is in a blacklist, if so, immediately carrying out DGA domain name treatment on the first domain name data, otherwise, judging whether the first domain name data is in a whitelist, and if so, not carrying out subsequent detection.

The blacklist is dynamically updated by the target domain name, the whitelist is composed of preset domain names, specifically, the initial whitelist is composed of domain names with Alexa ranking relatively top, and the blacklist is dynamically updated by the detected DGA domain names. In addition, the system supports the addition, deletion and modification of the black and white list by the user.

When the first domain name data is neither in the blacklist nor in the whitelist, the first domain name data is input into a preset probability prediction model to obtain the prediction probability of the first domain name data, so that whether the first domain name data is a target domain name or a non-target domain name can be determined according to the prediction probability. In this embodiment, the probabilistic predictive model is a model obtained by the training method of the probabilistic predictive model in the first or second embodiment.

When the domain name data is neither in the blacklist nor in the whitelist, probability detection is performed on the domain name data through the probability prediction model. The training of the model is performed through the functions provided by Tensorflow framework, and the trained model is stored on a disk in SaveModel format with the model and parameters so as to be convenient for the monitoring module to use. After the domain name system gateway service is started, a load_model method in Tensorflow frames is used for loading the model into a memory, and after the domain name data is preprocessed each time the probability of the domain name data is detected, the domain name data is input into the model for prediction. The prediction result of the model is a value between 0 and 1, and represents the probability that the domain name predicted by the model is a non-target domain name, and when the result of the value is smaller than 0.1, the domain name data can be regarded as the domain name of the domain name generation algorithm and the corresponding treatment is carried out.

When the input domain name data is detected to be the target domain name, namely, the detection result indicates that the domain name data in the flow is the DGA domain name, the interception or redirection operation is carried out on the domain name data. When the interception operation is carried out, a domain name system response packet with a response status code of REFUSED is constructed, and the corresponding packet is returned to the client; when the redirection operation is carried out, a response packet is constructed, the IP address in the response packet is the IP address preset by the client, and the response packet is returned to the client.

The domain name system flow interception function is mainly realized based on a Python socket library. The socket library is a basic library of network programming and provides functions of data receiving and transmitting and the like. Domain name system traffic to the gateway may be intercepted by the port of host 53 to which the gateway is deployed. After obtaining the domain name system traffic from the client at port 53, the traffic broker module creates a new thread to invoke process_request () to detect, filter, etc. the traffic and make a request to the upstream domain name system server. And finally, sending the response flow back to the client to complete one domain name system response.

If the detection result shows that the domain name data in the traffic is a non-target domain name, the traffic is not filtered, but the normal domain name system traffic agent or the safe domain name system traffic agent is carried out according to the preset configuration.

And (3) after the common domain name system flow agent detects the domain name system flow, calling a resolve () function in a dns. Resolve library, inquiring the domain name system of a pre-configured upstream domain name system server, and packaging the result into a domain name system response packet after the response of the upstream server is obtained, and returning the response packet to the client.

After the safe domain name system flow agent mode is configured, the domain name data is detected, the domain name and record type in the domain name system flow are extracted, and a uniform resource locator pointing to the upstream DoH server is constructedURL), and do DoH query, repackage the result obtained by the query into DNS reply packet, and return it to the client.

The embodiment of the application has the technical effects that the effective characteristics of input data can be automatically learned and automatic characteristic extraction can be carried out through the trained machine learning model integrating the convolutional neural network and the gating circulation unit, so that the problem of manually extracting the characteristics is avoided.

Fig. 4 is a second flowchart of a method for detecting a non-target domain name according to an embodiment of the present application. As shown in fig. 4, the method for detecting a non-target domain name according to the present embodiment is further refined based on the method for detecting a non-target domain name according to the previous embodiment of the present application. The method for detecting the non-target domain name provided in the embodiment includes the following steps:

S401, acquiring domain name data in domain name system traffic, and preprocessing the domain name data to obtain integer arrays with preset lengths.

In this embodiment, the operation and the manner of S401 are similar to those of S301 in the previous embodiment of the present invention, and will not be described herein.

S402, word embedding is conducted on the integer array so as to obtain first domain name data.

In this embodiment, the first domain name data is a low-dimensional dense matrix. The operation and manner of S402 are similar to those of S302 in the previous embodiment of the present invention, and will not be described herein.

S403, when the first domain name data is neither in a preset blacklist nor in a preset whitelist, inputting the first domain name data into a preset probability prediction model to obtain the prediction probability of the first domain name data.

In this embodiment, the training and implementation manners of the probability prediction model of S403 are similar to the effects and manners of S303 in the previous embodiment of the present invention, and are not described herein again.

S404, if the prediction probability is smaller than a first preset value, determining that the first domain name data is a target domain name, and if the prediction probability is larger than a second preset value, determining that the first domain name data is a non-target domain name.

Specifically, in this embodiment, the first preset value is 0.1, the second preset value is 0.2, if the obtained prediction probability is less than 0.1, the first domain name data is a DGA domain name, and if the obtained prediction probability is greater than 0.2, the first domain name data is a non-DGA domain name.

S405, if the prediction probability is not smaller than the first preset value and not larger than the second preset value, obtaining the entropy value, the consonant letter number and the domain name length of the first domain name data.

In this embodiment, if the obtained prediction probability is not less than 0.1 and not more than 0.2, the conventional DGA detection method is used as an auxiliary means for the other two detection methods. The traditional DGA detection mode mainly detects the domain name through analysis of domain name characteristics, and the mode mainly detects the domain name through the 3 indexes of the entropy value, the consonant letter number and the domain name length of the domain name.

S406, if the entropy value is larger than the third preset value, the number of consonant letters is larger than the fourth preset value and the domain name length is larger than the fifth preset value, determining the first domain name data as the target domain name, otherwise, determining the first domain name data as the non-target domain name.

Specifically, in this embodiment, the third preset value is 3.8, the fourth preset value is 7, and the fifth preset value is 12. When the entropy value of the first domain name data is greater than 3.8, the number of consonants is greater than 7, and the domain name length is greater than 12, the first domain name data can be regarded as a DGA domain name and a corresponding treatment can be made.

The method and the device have the technical effects that when DGA domain name detection is carried out through the trained machine learning model of the fusion convolutional neural network and the gating loop unit, whether the domain name data is a target domain name or a non-target domain name is judged through the prediction probability of the domain name data, and the calculation efficiency is high.

Fig. 5 is a schematic structural diagram of a training device for a probabilistic predictive model according to an embodiment of the present application. As shown in fig. 5, in this embodiment, the training apparatus 500 of the probabilistic predictive model may be located in an electronic device. The training device 500 for the probabilistic predictive model includes:

The preprocessing module 501 is configured to obtain domain name data in domain name system traffic, and preprocess the domain name data to obtain an integer array with a preset length;

the word embedding module 502 is configured to perform word embedding on the integer array so as to obtain first domain name data, where the first domain name data is a low-dimensional dense matrix;

The local feature extraction module 503 is configured to perform feature extraction and dimension reduction on the first domain name data through a convolutional neural network, so as to obtain a local key feature map of the first domain name data;

A context feature extraction module 504, configured to obtain context features of the local key feature map through the gating cycle unit;

The mapping module 505 is configured to map, when the task is a classification task, a feature space of the local key feature map and a feature space of the context feature to a one-dimensional sample tag space through a convolutional neural network, where the task is a classification task, and is configured to indicate whether the first domain name data is a target domain name or a non-target domain name;

And an output module 506, configured to obtain a prediction probability of the first domain name data according to the one-dimensional sample tag space, so as to obtain a probability that the first domain name data is a non-target domain name through the prediction probability.

The training device for the probabilistic predictive model provided in this embodiment may execute the technical scheme of the training method embodiment of the probabilistic predictive model shown in fig. 1, and its implementation principle and technical effects are similar to those of the training method embodiment of the probabilistic predictive model shown in fig. 1, and are not described in detail herein.

Meanwhile, the training device for the probability prediction model provided by the invention further refines the training device 500 for the probability prediction model on the basis of the training device for the probability prediction model provided by the previous embodiment.

Optionally, in this embodiment, the preprocessing module 501 is specifically configured to:

The domain name data comprises a plurality of capital letters and a plurality of lowercase letters, duplicate data and irrelevant data in the domain name data are deleted, and second domain name data are obtained;

And performing dictionary mapping on the fourth domain name data to obtain integer arrays with preset lengths.

Optionally, in this embodiment, the local feature extraction module 503 is specifically configured to:

Optionally, the local feature extraction module 503 performs a downsampling operation on the feature map through a pooling layer of the convolutional neural network to obtain a local key feature map of the first domain name data, where the feature map is a matrix;

and obtaining the maximum value in a plurality of values of each coverage area, and obtaining a local key feature map according to the maximum value in each coverage area.

Optionally, in this embodiment, the mapping module 505 is specifically configured to:

the feature space of the local key feature map and the feature space of the contextual features are mapped to a one-dimensional sample signature space by a fully connected layer of the convolutional neural network.

The training device for the probability prediction model provided in this embodiment may execute the technical scheme of the training method embodiment of the probability prediction model shown in fig. 1 and fig. 2, and its implementation principle and technical effect are similar to those of the training method embodiment of the probability prediction model shown in fig. 1 and fig. 2, and are not described in detail herein.

Fig. 6 is a schematic structural diagram of a non-target domain name detection device according to an embodiment of the present application. As shown in fig. 6, in this embodiment, the apparatus 600 for detecting a non-target domain name may be located in an electronic device. The non-target domain name detection apparatus 600 includes:

The preprocessing module 601 is configured to obtain domain name data in domain name system traffic, and preprocess the domain name data to obtain an integer array with a preset length;

The word embedding module 602 is configured to perform word embedding on the integer array so as to obtain first domain name data, where the first domain name data is a low-dimensional dense matrix;

The probability prediction module 603 is configured to input the first domain name data into a preset probability prediction model to obtain a prediction probability of the first domain name data when the first domain name data is neither in a preset blacklist nor in a preset whitelist, so as to determine whether the first domain name data is a target domain name or a non-target domain name according to the prediction probability, where the probability prediction model is a model obtained by a training device of the probability prediction model.

The technical scheme of the non-target domain name detection method shown in fig. 3 can be executed by the non-target domain name detection device provided in this embodiment, and the implementation principle and technical effects are similar to those of the non-target domain name detection method embodiment shown in fig. 3, and are not described in detail herein.

Meanwhile, the non-target domain name detection device 600 is further refined on the basis of the non-target domain name detection device provided in the previous embodiment.

Optionally, in this embodiment, when determining that the first domain name data is the target domain name or the non-target domain name according to the prediction probability, the probability prediction module 603 determines that the first domain name data is the target domain name if the prediction probability is smaller than a first preset value, and determines that the first domain name data is the non-target domain name if the prediction probability is greater than a second preset value, where the first preset value is smaller than the second preset value.

Optionally, in this embodiment, if the prediction probability is not less than the first preset value and not greater than the second preset value, the method further includes:

Acquiring entropy value, consonant letter number and domain name length of first domain name data;

if the entropy value is larger than the third preset value, the number of consonant letters is larger than the fourth preset value and the domain name length is larger than the fifth preset value, determining the first domain name data as the target domain name, otherwise, determining the first domain name data as the non-target domain name.

The non-target domain name detection device provided in this embodiment may execute the technical scheme of the non-target domain name detection method embodiment shown in fig. 3 and fig. 4, and its implementation principle and technical effect are similar to those of the non-target domain name detection method embodiment shown in fig. 3 and fig. 4, and are not described in detail herein.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device is intended for various electronic devices such as microcomputers, singlechips, and other suitable computers that can be used in a variety of training methods that can perform probabilistic predictive models or non-target domain name detection methods. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 7, the electronic device 70 includes at least one processor 701 and a memory 702. The electronic device 70 further comprises communication means 703. Wherein the processor 701, the memory 702 and the communication means 703 are connected by a bus 704.

In a specific implementation, the at least one processor 701 executes computer-executable instructions stored in the memory 702, so that the at least one processor 701 executes the training method of the probabilistic predictive model or the method of detecting the non-target domain name as executed on the electronic device side.

The specific implementation process of the processor 701 may refer to the training method embodiment of the probability prediction model or the non-target domain name detection method embodiment, and its implementation principle and technical effect are similar, and this embodiment will not be described herein.

In the above embodiment, it should be understood that the Processor 701 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), and the like. The general purpose processor 701 may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.

The memory 702 may comprise high-speed RAM memory, and may also include non-volatile storage NVM, such as at least one disk memory.

Bus 704 may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (PERIPHERAL COMPONENT, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus 704 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, the bus 704 in the present application is not limited to only one bus or to one type of bus.

The scheme provided by the embodiment of the invention is introduced aiming at the functions realized by the electronic equipment and the main control equipment. It will be appreciated that the electronic device or the master device, in order to implement the above-described functions, includes corresponding hardware structures and/or software modules that perform the respective functions. The present embodiments can be implemented in hardware or a combination of hardware and computer software in combination with the various exemplary elements and algorithm steps described in connection with the embodiments disclosed in the embodiments of the present invention. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not to be considered as beyond the scope of the embodiments of the present invention.

The application also provides a computer readable storage medium, wherein the computer readable storage medium stores computer execution instructions, and when the processor executes the computer execution instructions, the training method of the probability prediction model or the detection method of the non-target domain name is realized.

The computer readable storage medium described above may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk. A readable storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.

An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. The readable storage medium may also be integral to the processor. The processor and the readable storage medium may reside in an Application SPECIFIC INTEGRATED Circuits (ASIC). The processor and the readable storage medium may reside as discrete components in an electronic device or a host device.

Memory 702 is a non-transitory computer readable storage medium provided by the present invention. The non-transitory computer readable storage medium of the present invention stores computer instructions for causing a computer to execute the training method of the probabilistic predictive model provided by the present invention or the detection method of the non-target domain name.

The memory 702 is used as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and a module, such as a program instruction/module corresponding to a model training method or a program instruction/module corresponding to a domain name detection method in an embodiment of the present invention (e.g., the preprocessing module 501, the word embedding module 502, the local feature extraction module 503, the contextual feature extraction module 504, the mapping module 505, the output module 506, or the preprocessing module 601, the word embedding module 602, and the probability prediction module 603 shown in fig. 6) shown in fig. 5. The processor 701 executes various functional applications and data processing by executing non-transitory software programs, instructions, and modules stored in the memory 702, that is, implements the training method of the probabilistic predictive model or the detection method of the non-target domain name in the above-described method embodiment.

Meanwhile, the present embodiment also provides a computer program product, which enables the training method of the probability prediction model of the above embodiment or the detection method of the non-target domain name to be performed when instructions in the computer program product are executed by a processor.

Other implementations of the examples of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of embodiments of the invention following, in general, the principles of the embodiments of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the embodiments of the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the embodiments being indicated by the following claims.

It is to be understood that the embodiments of the invention are not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be made without departing from the scope thereof. The scope of embodiments of the invention is limited only by the appended claims.

Claims

1. A method for training a probabilistic predictive model, comprising:

2. The method for training a probabilistic predictive model of claim 1, wherein the performing feature extraction and dimension reduction on the first domain name data through a convolutional neural network to obtain a local key feature map of the first domain name data comprises:

3. The method for training a probabilistic predictive model as recited in claim 2, wherein the feature map is a matrix;

4. The method of training a probabilistic predictive model of claim 1, wherein the domain name data comprises a number of uppercase letters and a number of lowercase letters;

5. The method of training a probabilistic predictive model as recited in claim 1, wherein said mapping, via the convolutional neural network, the feature space of the local key feature map and the feature space of the contextual feature to a one-dimensional sample label space comprises:

6. A method for detecting a non-target domain name, comprising:

When the first domain name data is neither in a preset blacklist nor in a preset whitelist, inputting the first domain name data into a preset probability prediction model to obtain the prediction probability of the first domain name data so as to determine whether the first domain name data is a target domain name or a non-target domain name according to the prediction probability, wherein the probability prediction model is a model obtained by the training method of the probability prediction model according to any one of claims 1 to 5.

7. The method according to claim 6, wherein determining whether the first domain name data is a target domain name or a non-target domain name according to the prediction probability comprises:

8. The method according to claim 7, wherein if the prediction probability is not less than the first preset value and not greater than the second preset value, the method further comprises:

9. A training device for a probabilistic predictive model, comprising:

10. A device for detecting a non-target domain name, comprising:

The probability prediction module is used for inputting the first domain name data into a preset probability prediction model when the first domain name data is neither in a preset blacklist nor in a preset whitelist, so as to obtain the prediction probability of the first domain name data, and determining whether the first domain name data is a target domain name or a non-target domain name according to the prediction probability, wherein the probability prediction model is a model obtained through the training device of the probability prediction model according to claim 9.