CN112804189A

CN112804189A - Cloud and mist cooperation-based intrusion detection method for Internet of vehicles

Info

Publication number: CN112804189A
Application number: CN202011491452.6A
Authority: CN
Inventors: 赖英旭; 曹天浩; 刘静; 王一鹏
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2021-05-14

Abstract

The invention discloses a vehicle networking intrusion detection method based on cloud-mist collaboration, which is mainly composed of three parts, including: Step 1. Due to the different computing capabilities of fog nodes and cloud servers, a cloud-mist collaborative defense architecture is designed. Traffic data is divided into normal data and suspicious data. The suspicious data is specifically classified on the cloud server with powerful computing resources to determine the attack type. Step 2: Due to the limited resources of fog nodes and the complex and changeable network environment, the CART decision tree algorithm is used to detect suspicious data and benign data more quickly. Step 3: Aiming at the data imbalance problem in the Internet of Vehicles scenario, a cost-sensitive CNN model is designed to specifically classify suspicious data and reduce the false negative rate of a few attacks. The algorithm is evaluated on a simulated real-world IoV dataset, which achieves high performance with low resource requirements.

Description

Cloud and mist cooperation-based intrusion detection method for Internet of vehicles

Technical Field

The invention relates to the technical field of Internet of vehicles network security, in particular to an Internet of vehicles intrusion detection method based on cloud and mist cooperation.

Background

The rapid development of intelligent transportation enables vehicles to communicate with adjacent vehicles or network infrastructure, and to acquire traffic conditions in time, thereby improving safety and efficiency, but also raising many safety issues. The development of the internet of vehicles has great potential safety hazard due to the attack of hackers, and an attacker can access the network and tamper confidential data by using a vulnerability, so that more accidents can be caused, and the significance of safe driving is changed. The attack may destroy the system function of the internet of vehicles, or may abuse the internet of vehicles for its own purpose. For example, some hackers may penetrate the vehicle internal network and use the external network to attack the vehicle by stealing the vehicle-mounted device, and then use the attacked abnormal vehicle to interfere with other users in the vehicle networking environment, which may seriously damage the benefit of the user and even threaten the personal safety of the user.

The internet of vehicles is a fast moving network with strong dynamics, so that the real-time performance of information sharing among vehicles is very important. Since the time of encounter between vehicles is short and rapid action is required on the received information, it is important to quickly determine the reliability of the information. Cryptography involves pairwise keys and overhead, involves computational cost, storage and time, and key theft can lead to intrusion into the internet of vehicles, making it more difficult to guard against attacks initiated from inside the vehicle. Therefore, intrusion detection systems must be deployed in the internet of vehicles network to detect attacks.

In addition to these safety-related challenges, the vehicle also needs to process data collected and received from other vehicles. If the collected traffic data is sent to the cloud to perform required calculation, and then the result is communicated to the vehicles, calculation and communication overhead among the vehicles can be limited, and the privacy of the vehicles can be improved. However, since the road information is time sensitive, this solution may be inefficient. In fog computing, where fog nodes are located between end users and the cloud, with roadside units as fog nodes, fog computing may be an alternative to road condition computing. In this case, the road side unit collects traffic data from vehicles within each road side unit area, and the road conditions are extracted by analyzing the collected data by the road side unit. Communication, detection, positioning between vehicles may be indirectly interacted through the fog nodes.

Therefore, in order to solve the information security problem of the internet of vehicles, the cloud and fog cooperation-based intrusion detection method for the internet of vehicles is provided, the particularity of the internet of vehicles compared with the original traditional internet is fully considered, and the characteristics of higher computing capacity, storage capacity and security requirements are mainly adopted, so that an intrusion detection model is constructed by machine learning and deep learning technologies which are broken through in various fields at present.

Disclosure of Invention

The invention aims to provide a method for detecting the intrusion of a vehicle networking network, which is used for solving the problem of high dynamic vehicle networking network safety. The technical scheme for solving the technical problems is as follows, and the cloud and mist cooperation-based intrusion detection method for the Internet of vehicles comprises the following steps:

step 1, converting the vehicle networking data into a feature vector data set, wherein the feature vector set specifically comprises information such as an 802.11p protocol IP address and type, time, a source IP, a destination IP, a protocol name, a packet size, a port number, a flag and the like in a UDP datagram and an IP datagram, and packet loss rate, communication link times and the like. Learning the characteristic vector data set by using a decision tree CART algorithm at the mist node with limited resources to obtain a decision tree CART classifier;

step 2, preliminarily classifying the data by adopting a decision tree CART at the fog node, and sending the preliminary classification result to a cloud server by the fog node;

and 3, deploying a cost sensitive CNN algorithm on the cloud server, and specifically classifying the data sent by the fog nodes.

The key technical points of the invention are as follows: the CART decision tree algorithm is adopted for the first time in the fog node detection of the Internet of vehicles, the algorithm has the characteristics of simple model and simple rule extraction, a binary tree-form simple decision tree is formed by utilizing a binary recursive splitting method, and the method is suitable for the requirements of limited fog node resources and real-time detection; according to the characteristics that the resources of the fog nodes are limited and the resources of the cloud server are unlimited, different computing tasks are distributed at the fog nodes and the cloud server, and the cooperative computing is realized: the data are divided into normal data and suspicious data by the fog node, the suspicious data are sent to the cloud server, and specific attack category detection is carried out on the cloud server; the cloud server side adopts cost sensitive CNN, namely a cost matrix is added between softmax and loss layers of the CNN, and parameters are automatically updated through joint optimization, so that the detection accuracy of the attack is improved.

The invention has the beneficial effects that:

the cloud computing method and the cloud computing system avoid the situation that all collected traffic data are sent to the cloud to execute required computation, reduce end-to-end time delay, detect the flow passing through the cloud nodes by adopting a CART algorithm, and meet the real-time requirement of the Internet of vehicles.

Secondly, the cloud and mist cooperative mode is adopted, so that the mist nodes and the cloud server work cooperatively, the storage and calculation advantages of different devices are better utilized, and the attack behavior in the environment is detected.

By improving the CNN algorithm, the unbalanced data in the actual scene can be better processed, the attack behavior can be accurately detected, and the safety of the data in the cloud server can be protected.

In conclusion, the attack behavior can be detected more quickly by adopting the CART algorithm, and the real-time requirement of the fog node is met. And by adopting a cloud and mist cooperative mode, resources in the mist nodes and the cloud server can be effectively utilized. At the Internet of vehicles server side, the detection accuracy rate of the unbalanced data can be improved based on the cost sensitive CNN method. The invention can detect abnormal behaviors from the network flow of the Internet of vehicles and protect the network security of the Internet of vehicles.

Drawings

Fig. 1 is a schematic view of the general structure of the present invention.

FIG. 2 is a schematic illustration of the cloud and mist cooperative detection of the present invention.

Fig. 3 is a model diagram of the CNN algorithm employed in the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Example one

In the first embodiment, a CART decision tree algorithm is adopted, the algorithm has the characteristics of simple model and simple rule extraction, a binary tree-form simple decision tree is formed by utilizing a binary recursive splitting method, and the method is suitable for an intrusion detection algorithm on a fog node. The algorithm principle is as follows:

step 1, calculating the GINI coefficient of each attribute in the attributes, and selecting the attribute with the minimum GINI coefficient as the splitting attribute of the root node. For the continuous attribute, calculating a segmentation threshold value, discretizing the continuous attribute according to the segmentation threshold value, and calculating a GINI coefficient of the continuous attribute; for the discrete attribute, the sample set needs to be divided according to the possible subsets of the discrete attribute value, if there are N discrete attributes, then there are 2 effective subsetsⁿ-2, then selecting the subset with the smallest GINI coefficient as the partition of the discrete attribute, the smallest GINI coefficient being the GINI coefficient of the discrete attribute.

Calculation of the GINI coefficient:

(1) assume the entire sample set is S and the class set is { C₁，C₂，...，C_nDividing the data into n classes, each class corresponding to a sample subset S_i. Let | S | be the number of samples in the sample set S, | C_iI is the class C in the sample set S_iThe number of samples of (1), the GINI coefficient of the sample set is defined as follows

Wherein p is_i＝|C_iI/S I belongs to class C for sample set sample_iThe probability of (c).

(2) When there is only binary splitting, the subset S into which S is divided for the attribute A in the training sample set S₁And S₂The GINI coefficient for a given partition S is as follows

And the k-th subset occupies the weight of the whole sample set.

If the split attribute is a continuous attribute, dividing the sample set into two parts of T and T according to the value of the attribute, wherein T is a division threshold value of the continuous attribute; if the split attribute is a discrete attribute, the sample set is divided into two parts according to whether the value of the attribute is contained in the true subset of the discrete attribute with the minimum GINI coefficient.

Step 3, two sample subsets S corresponding to the splitting attribute of the root node₁And S₂And recursively establishing child nodes of the tree by adopting the same method as the step 1. And the process is circulated until the samples in all the child nodes belong to the same category or no attribute which can be selected as the splitting attribute exists.

And 4, pruning the generated decision tree.

Based on the method, the invention adopts detection time and recall (recall) evaluation indexes commonly used in the field of machine learning to evaluate the effectiveness and reliability of the algorithm. The evaluation criteria are defined as follows:

detection time T2 (detection completion time) -T1 (start detection time)

Example two

As shown in fig. 1, the second embodiment is a schematic diagram of the general structure of the invention in the environment of internet of vehicles, and the general structure is mainly divided into three parts: cloud server, fog node and terminal equipment. As shown in fig. 2, in order to reasonably utilize resources in the cloud computing and cloud computing system and effectively execute the intrusion detection task, the cloud and mist cooperative detection method of the present invention includes:

and step 21, the fog node classifies the data into normal data and suspicious data according to a second classification. And if the fog node detects normal data, the normal data is processed locally, and the data sent to the cloud server is reduced, so that the user privacy data in the intelligent transportation environment is protected.

And step 22, on the fog node, if the detected data is abnormal data, the fog node sends the abnormal data to the cloud server.

And step 23, performing multi-classification on the abnormal data by adopting a cost sensitive CNN algorithm at the cloud server to obtain a specific attack type.

And 24, the response system in the cloud server sends the result to an administrator at the fog node end, and the administrator can find the infected intelligent equipment and take corresponding measures. Therefore, cooperative work of the fog nodes and the cloud server is achieved.

EXAMPLE III

Embodiment three is an improvement to the CNN algorithm employed on the cloud server. In real life, a large amount of normal traffic and a small amount of abnormal traffic exist in network traffic passing through intelligent traffic fog nodes, so the method and the system attempt to apply cost-sensitive automatic learning to a convolutional neural network of unbalanced data.

Step 1, the invention provides a new cost matrix xi for modifying the last layer of CNN, between softmax and loss layer. The invention introduces a new cost matrix to enable the algorithm model to correctly classify the infrequent classes. Thus, CNN output O is modified using cost matrix ξ according to cross-entropy loss function F as follows:

wherein y is⁽ⁱ⁾Represents the modified output, p represents the desired class,

representing a function, in particular a cross-entropy loss function, O⁽ⁱ⁾Is the output of the CNN, and,

indicating that the modified desired class will output a higher value than the other classes.

Step 2, the method solves the class imbalance problem in CNN training, and introduces a cost sensitive error function which can be expressed as average loss on the training set

Wherein the predicted output y before the loss layer is influenced by parameters theta and xi, theta is CNN parameter, xi is cost matrix parameter, M is total number of training set, N represents total number of neurons of the output layer, l (d)⁽ⁱ⁾，y⁽ⁱ⁾ _θ，_ξ) For the cross entropy loss function, d ∈ {0, 1}^1×NIs the desired output (Σ)_nd_n：＝1)，y⁽ⁱ⁾The softmax value obtained is shown. When the model does not work well on the training set, the error is larger, and the learning algorithm aims to find the optimal parameters (theta, xi), so that the average loss of the cost is reduced. Thus optimizing the objective of

(θ^*，ξ^*) Argmin E (θ, ξ) (equation 4)

The penalty function in the equation selects a cross-entropy penalty function that maximizes the closeness of the prediction to the desired output, as follows:

l(d，y)＝-∑_n(d_nlog y_n) (formula 5)

d_nIs the desired output (Σ)_nd_n：＝1)，y_nThe softmax value obtained is shown. Wherein y is_nClass dependent cost matrix and output o from the softmax function_nCorrelation, the following formula is the position where the cost matrix is added:

wherein the softmax output of the nth element is

Which is the ratio of the index of the nth element to the sum of the indices of all elements.

Step 3, learning optimal parameters

When the cross entropy loss function is used, the goal is to learn the parameter θ and the class dependent loss function parameter ξ together. For joint optimization, the two types of parameters are solved alternately by keeping one fixed parameter and minimizing the cost of the other parameter. The algorithm is as follows:

the CNN algorithm is improved by adopting the method, and the improved algorithm is applied to a cloud server of intelligent transportation, so that the classification of intrusion data is realized.

Step 4, the structure of the CNN model is shown in fig. 3. The model consists of two convolutional layers, two pooling layers, a fully-connected layer and a dropout layer, and a softmax classifier.

Based on the method, the invention adopts the detection accuracy (precision) evaluation index commonly used in the field of machine learning to evaluate the effectiveness and reliability of the algorithm. The evaluation criteria are defined as follows:

it should be understood that although the description is made in terms of embodiments, not every embodiment includes only a single embodiment, and such description is for clarity only, and those skilled in the art will recognize that the embodiments described herein may be combined as appropriate, and implemented as would be understood by those skilled in the art.

Claims

1. A vehicle networking intrusion detection method based on cloud-mist collaboration, is characterized in that: adopt the data classification method of cloud-mist collaboration, adopt decision tree CART classifier to carry out rough classification at fog nodes, and cloud server adopts cost-sensitive CNN algorithm to carry out specific classification, include:

Step 1: Convert the Internet of Vehicles data into a feature vector data set. The feature vector set specifically includes the 802.11p protocol IP address and type, the time, source IP, destination IP, protocol name, packet size, UDP datagram and IP datagram in the UDP datagram. Port number, flag information, as well as packet loss rate and number of communication links. The decision tree CART algorithm is used to learn the feature vector data set in the fog nodes with limited resources, and the decision tree CART classifier is obtained;

In step 2, the decision tree CART is used at the fog node to preliminarily classify the feature vector data set, and the preliminarily classified feature vector data is sent to the cloud server;

Step 3: On the cloud server, deploy the cost-sensitive CNN algorithm, and the cost-sensitive CNN algorithm specifically classifies the data sent by the fog nodes.

2. a kind of car networking intrusion detection method based on cloud-fog collaboration according to claim 1, is characterized in that, in described step 2, adopts decision tree CART algorithm to carry out preliminary classification in fog node, comprises:

The CART decision tree algorithm is used in the fog node detection of the Internet of Vehicles. The CART decision tree selects the attribute with the smallest GINI coefficient as the split attribute of the root node, and uses the binary recursive splitting method to form a simple decision tree in the form of a binary tree. The node has the highest efficiency for binary classification, which is suitable for the limited resources of fog nodes and real-time detection requirements.

3. The method for intrusion detection of the Internet of Vehicles based on cloud-fog collaboration according to claim 1, wherein in the step 2, different computing tasks are allocated in the fog node and the cloud server to achieve collaboration, and the specific collaboration steps include:

Step 21, the fog node classifies the data into two groups, normal data and suspicious data. If the fog node detects normal data, it will be processed locally to reduce the data sent to the cloud server to protect user privacy data in an intelligent transportation environment.

Step 22, on the fog node, if the detected data is abnormal data, the fog node sends the data to the cloud server.

Step 23: On the cloud server, the cost-sensitive CNN algorithm performs multi-classification on the abnormal data to obtain specific attack types.

In step 24, the response system in the cloud server will send the result to the administrator at the fog node side. The administrator finds the infected smart device and takes measures to realize the coordinated work between the fog node and the cloud server.

4. a kind of car networking intrusion detection method based on cloud-fog collaboration according to claim 1, is characterized in that, in described step 3, cost-sensitive CNN adds cost matrix ξ between softmax and loss layer of CNN, through joint optimization Automatically update parameters, including:

Step 31: The suspicious data screened by the fog node is passed to the cost-sensitive CNN algorithm, and the data label is updated to a specific attack label. In order to reduce the impact of class imbalance on the algorithm, modify the last layer of CNN and add a cost matrix between the softmax and loss layers;

Step 32, before calculating the classification loss, the result of the cost matrix is compressed into [0, 1], and the loss function adopts the cross entropy loss function;

Step 33, the position where the cost matrix ξ is added, in the formula of softmax

Before the index value of each element, a cost value is correspondingly multiplied, and all the cost values constitute the value of the cost matrix, where on in the _softmax formula represents the output of the two-layer CNN.

Step 34, when using the cross-entropy loss function, it is necessary to update the cost-sensitive CNN parameter θ and the cost matrix parameter ξ, and use a joint optimization method to update θ and ξ;

5. step 34 according to claim 4 adopts joint optimization mode to update θ and ξ, specifically comprising:

Step 51, for the optimization of θ, use stochastic gradient descent with error backpropagation. In order to optimize ξ, the gradient descent algorithm is used again to calculate the direction of the step size to update the parameters, as follows;

Step 52: Create a cost-sensitive CNN network, initialize the neural network parameter θ, and set the cost matrix and error initialization to 1;

Step 53, the epoch cycle starts until the maximum number of epochs is reached;

Step 54: Calculate the gradient grad(x, d, F(ξ)), update the gradient parameters, where x is the data and d is the data label;

Step 55, in the batch loop, the output is obtained by forward propagation, the gradient is obtained by back propagation, the network parameters are updated, and the loop is exited when the maximum number of batches is reached;

Step 56, the error is obtained by forward propagation, if the error is greater than the set error, the learning rate of the cost matrix is reduced by 100 times, and the error is updated;

Step 57, the epoch loop is stopped, and the loop is exited;

In step 58, the optimal values of the cost matrix parameter ξ and the learning parameter θ are obtained. The cost-sensitive CNN algorithm is trained using the feature vector dataset to be identified, and the cost-sensitive CNN algorithm classifier is obtained.