Background
The rapid development of intelligent transportation enables vehicles to communicate with adjacent vehicles or network infrastructure, and to acquire traffic conditions in time, thereby improving safety and efficiency, but also raising many safety issues. The development of the internet of vehicles has great potential safety hazard due to the attack of hackers, and an attacker can access the network and tamper confidential data by using a vulnerability, so that more accidents can be caused, and the significance of safe driving is changed. The attack may destroy the system function of the internet of vehicles, or may abuse the internet of vehicles for its own purpose. For example, some hackers may penetrate the vehicle internal network and use the external network to attack the vehicle by stealing the vehicle-mounted device, and then use the attacked abnormal vehicle to interfere with other users in the vehicle networking environment, which may seriously damage the benefit of the user and even threaten the personal safety of the user.
The internet of vehicles is a fast moving network with strong dynamics, so that the real-time performance of information sharing among vehicles is very important. Since the time of encounter between vehicles is short and rapid action is required on the received information, it is important to quickly determine the reliability of the information. Cryptography involves pairwise keys and overhead, involves computational cost, storage and time, and key theft can lead to intrusion into the internet of vehicles, making it more difficult to guard against attacks initiated from inside the vehicle. Therefore, intrusion detection systems must be deployed in the internet of vehicles network to detect attacks.
In addition to these safety-related challenges, the vehicle also needs to process data collected and received from other vehicles. If the collected traffic data is sent to the cloud to perform required calculation, and then the result is communicated to the vehicles, calculation and communication overhead among the vehicles can be limited, and the privacy of the vehicles can be improved. However, since the road information is time sensitive, this solution may be inefficient. In fog computing, where fog nodes are located between end users and the cloud, with roadside units as fog nodes, fog computing may be an alternative to road condition computing. In this case, the road side unit collects traffic data from vehicles within each road side unit area, and the road conditions are extracted by analyzing the collected data by the road side unit. Communication, detection, positioning between vehicles may be indirectly interacted through the fog nodes.
Therefore, in order to solve the information security problem of the internet of vehicles, the cloud and fog cooperation-based intrusion detection method for the internet of vehicles is provided, the particularity of the internet of vehicles compared with the original traditional internet is fully considered, and the characteristics of higher computing capacity, storage capacity and security requirements are mainly adopted, so that an intrusion detection model is constructed by machine learning and deep learning technologies which are broken through in various fields at present.
Disclosure of Invention
The invention aims to provide a method for detecting the intrusion of a vehicle networking network, which is used for solving the problem of high dynamic vehicle networking network safety. The technical scheme for solving the technical problems is as follows, and the cloud and mist cooperation-based intrusion detection method for the Internet of vehicles comprises the following steps:
step 1, converting the vehicle networking data into a feature vector data set, wherein the feature vector set specifically comprises information such as an 802.11p protocol IP address and type, time, a source IP, a destination IP, a protocol name, a packet size, a port number, a flag and the like in a UDP datagram and an IP datagram, and packet loss rate, communication link times and the like. Learning the characteristic vector data set by using a decision tree CART algorithm at the mist node with limited resources to obtain a decision tree CART classifier;
step 2, preliminarily classifying the data by adopting a decision tree CART at the fog node, and sending the preliminary classification result to a cloud server by the fog node;
and 3, deploying a cost sensitive CNN algorithm on the cloud server, and specifically classifying the data sent by the fog nodes.
The key technical points of the invention are as follows: the CART decision tree algorithm is adopted for the first time in the fog node detection of the Internet of vehicles, the algorithm has the characteristics of simple model and simple rule extraction, a binary tree-form simple decision tree is formed by utilizing a binary recursive splitting method, and the method is suitable for the requirements of limited fog node resources and real-time detection; according to the characteristics that the resources of the fog nodes are limited and the resources of the cloud server are unlimited, different computing tasks are distributed at the fog nodes and the cloud server, and the cooperative computing is realized: the data are divided into normal data and suspicious data by the fog node, the suspicious data are sent to the cloud server, and specific attack category detection is carried out on the cloud server; the cloud server side adopts cost sensitive CNN, namely a cost matrix is added between softmax and loss layers of the CNN, and parameters are automatically updated through joint optimization, so that the detection accuracy of the attack is improved.
The invention has the beneficial effects that:
the cloud computing method and the cloud computing system avoid the situation that all collected traffic data are sent to the cloud to execute required computation, reduce end-to-end time delay, detect the flow passing through the cloud nodes by adopting a CART algorithm, and meet the real-time requirement of the Internet of vehicles.
Secondly, the cloud and mist cooperative mode is adopted, so that the mist nodes and the cloud server work cooperatively, the storage and calculation advantages of different devices are better utilized, and the attack behavior in the environment is detected.
By improving the CNN algorithm, the unbalanced data in the actual scene can be better processed, the attack behavior can be accurately detected, and the safety of the data in the cloud server can be protected.
In conclusion, the attack behavior can be detected more quickly by adopting the CART algorithm, and the real-time requirement of the fog node is met. And by adopting a cloud and mist cooperative mode, resources in the mist nodes and the cloud server can be effectively utilized. At the Internet of vehicles server side, the detection accuracy rate of the unbalanced data can be improved based on the cost sensitive CNN method. The invention can detect abnormal behaviors from the network flow of the Internet of vehicles and protect the network security of the Internet of vehicles.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Example one
In the first embodiment, a CART decision tree algorithm is adopted, the algorithm has the characteristics of simple model and simple rule extraction, a binary tree-form simple decision tree is formed by utilizing a binary recursive splitting method, and the method is suitable for an intrusion detection algorithm on a fog node. The algorithm principle is as follows:
step 1, calculating the GINI coefficient of each attribute in the attributes, and selecting the attribute with the minimum GINI coefficient as the splitting attribute of the root node. For the continuous attribute, calculating a segmentation threshold value, discretizing the continuous attribute according to the segmentation threshold value, and calculating a GINI coefficient of the continuous attribute; for the discrete attribute, the sample set needs to be divided according to the possible subsets of the discrete attribute value, if there are N discrete attributes, then there are 2 effective subsetsn-2, then selecting the subset with the smallest GINI coefficient as the partition of the discrete attribute, the smallest GINI coefficient being the GINI coefficient of the discrete attribute.
Calculation of the GINI coefficient:
(1) assume the entire sample set is S and the class set is { C1,C2,...,CnDividing the data into n classes, each class corresponding to a sample subset Si. Let | S | be the number of samples in the sample set S, | CiI is the class C in the sample set SiThe number of samples of (1), the GINI coefficient of the sample set is defined as follows
Wherein p isi=|CiI/S I belongs to class C for sample set sampleiThe probability of (c).
(2) When there is only binary splitting, the subset S into which S is divided for the attribute A in the training sample set S1And S2The GINI coefficient for a given partition S is as follows
And the k-th subset occupies the weight of the whole sample set.
If the split attribute is a continuous attribute, dividing the sample set into two parts of T and T according to the value of the attribute, wherein T is a division threshold value of the continuous attribute; if the split attribute is a discrete attribute, the sample set is divided into two parts according to whether the value of the attribute is contained in the true subset of the discrete attribute with the minimum GINI coefficient.
Step 3, two sample subsets S corresponding to the splitting attribute of the root node1And S2And recursively establishing child nodes of the tree by adopting the same method as the step 1. And the process is circulated until the samples in all the child nodes belong to the same category or no attribute which can be selected as the splitting attribute exists.
And 4, pruning the generated decision tree.
Based on the method, the invention adopts detection time and recall (recall) evaluation indexes commonly used in the field of machine learning to evaluate the effectiveness and reliability of the algorithm. The evaluation criteria are defined as follows:
detection time T2 (detection completion time) -T1 (start detection time)
Example two
As shown in fig. 1, the second embodiment is a schematic diagram of the general structure of the invention in the environment of internet of vehicles, and the general structure is mainly divided into three parts: cloud server, fog node and terminal equipment. As shown in fig. 2, in order to reasonably utilize resources in the cloud computing and cloud computing system and effectively execute the intrusion detection task, the cloud and mist cooperative detection method of the present invention includes:
and step 21, the fog node classifies the data into normal data and suspicious data according to a second classification. And if the fog node detects normal data, the normal data is processed locally, and the data sent to the cloud server is reduced, so that the user privacy data in the intelligent transportation environment is protected.
And step 22, on the fog node, if the detected data is abnormal data, the fog node sends the abnormal data to the cloud server.
And step 23, performing multi-classification on the abnormal data by adopting a cost sensitive CNN algorithm at the cloud server to obtain a specific attack type.
And 24, the response system in the cloud server sends the result to an administrator at the fog node end, and the administrator can find the infected intelligent equipment and take corresponding measures. Therefore, cooperative work of the fog nodes and the cloud server is achieved.
EXAMPLE III
Embodiment three is an improvement to the CNN algorithm employed on the cloud server. In real life, a large amount of normal traffic and a small amount of abnormal traffic exist in network traffic passing through intelligent traffic fog nodes, so the method and the system attempt to apply cost-sensitive automatic learning to a convolutional neural network of unbalanced data.
Step 1, the invention provides a new cost matrix xi for modifying the last layer of CNN, between softmax and loss layer. The invention introduces a new cost matrix to enable the algorithm model to correctly classify the infrequent classes. Thus, CNN output O is modified using cost matrix ξ according to cross-entropy loss function F as follows:
wherein y is
(i)Represents the modified output, p represents the desired class,
representing a function, in particular a cross-entropy loss function, O
(i)Is the output of the CNN, and,
indicating that the modified desired class will output a higher value than the other classes.
Step 2, the method solves the class imbalance problem in CNN training, and introduces a cost sensitive error function which can be expressed as average loss on the training set
Wherein the predicted output y before the loss layer is influenced by parameters theta and xi, theta is CNN parameter, xi is cost matrix parameter, M is total number of training set, N represents total number of neurons of the output layer, l (d)(i),y(i) θ,ξ) For the cross entropy loss function, d ∈ {0, 1}1×NIs the desired output (Σ)ndn:=1),y(i)The softmax value obtained is shown. When the model does not work well on the training set, the error is larger, and the learning algorithm aims to find the optimal parameters (theta, xi), so that the average loss of the cost is reduced. Thus optimizing the objective of
(θ*,ξ*) Argmin E (θ, ξ) (equation 4)
The penalty function in the equation selects a cross-entropy penalty function that maximizes the closeness of the prediction to the desired output, as follows:
l(d,y)=-∑n(dnlog yn) (formula 5)
dnIs the desired output (Σ)ndn:=1),ynThe softmax value obtained is shown. Wherein y isnClass dependent cost matrix and output o from the softmax functionnCorrelation, the following formula is the position where the cost matrix is added:
wherein the softmax output of the nth element is
Which is the ratio of the index of the nth element to the sum of the indices of all elements.
Step 3, learning optimal parameters
When the cross entropy loss function is used, the goal is to learn the parameter θ and the class dependent loss function parameter ξ together. For joint optimization, the two types of parameters are solved alternately by keeping one fixed parameter and minimizing the cost of the other parameter. The algorithm is as follows:
the CNN algorithm is improved by adopting the method, and the improved algorithm is applied to a cloud server of intelligent transportation, so that the classification of intrusion data is realized.
Step 4, the structure of the CNN model is shown in fig. 3. The model consists of two convolutional layers, two pooling layers, a fully-connected layer and a dropout layer, and a softmax classifier.
Based on the method, the invention adopts the detection accuracy (precision) evaluation index commonly used in the field of machine learning to evaluate the effectiveness and reliability of the algorithm. The evaluation criteria are defined as follows:
it should be understood that although the description is made in terms of embodiments, not every embodiment includes only a single embodiment, and such description is for clarity only, and those skilled in the art will recognize that the embodiments described herein may be combined as appropriate, and implemented as would be understood by those skilled in the art.