CN109885092B

CN109885092B - Unmanned aerial vehicle flight control data identification method

Info

Publication number: CN109885092B
Application number: CN201910229398.9A
Authority: CN
Inventors: 毛璀
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2020-10-13
Anticipated expiration: 2039-03-25
Also published as: CN109885092A

Abstract

The invention discloses a method for identifying flight control data of an unmanned aerial vehicle, and relates to the field of unmanned aerial vehicle communication. The method is that the communication data packet between the unmanned aerial vehicle and the control terminal is captured under the suspension state of the unmanned aerial vehicle, after preprocessing the data packet, extracting a characteristic vector by using an n-gram model in a natural language, then clustering the characteristic vector by using a K-means + + algorithm, performing model training on each clustered cluster by using an One-Class-SVM model, so as to generate a spherical surface of a high-dimensional space, then, the traffic generated by the unmanned aerial vehicle under the artificial control is marked as abnormal traffic by using a malicious traffic detection method, each data traffic under the artificial control state is detected by using each model trained before, because the flight control data generated under artificial control is not in the model in the suspension state and can be marked as abnormal flow by all the models, the flight control data is identified according to the marks of all the models.

Description

A method for identifying UAV flight control data

技术领域technical field

本发明涉及无人机领域，具体为一种无人机飞控数据的识别方法。The invention relates to the field of unmanned aerial vehicles, in particular to a method for identifying the flight control data of unmanned aerial vehicles.

背景技术Background technique

无人机在军用和民用领域都有很大的用途，但是目前的法律法规不够完善，很多无人机在敏感的军事地带，机场等地方飞行，严重危害国家和公共安全。UAVs have great uses in both military and civilian fields, but the current laws and regulations are not perfect. Many UAVs fly in sensitive military areas, airports and other places, seriously endangering national and public security.

因此，为了加强对无人机进行监管，还需要必要的技术手段，必须对其进行协议逆向，只有逆向了其协议格式，才能从技术手段上对无人机实施捕获，实现对无人机的管控。然而，在遥控器和无人机的通信的过程中，产生的信号不仅仅包含有控制器控制无人机飞行状态的信号，由于无人机可能需要执行某些任务，例如航拍，植保等，所以会伴随着产生其他类型的信号，例如无人机飞行姿态信号，gps信号等，这其中对我们最有价值的部分是是飞行控制信号，完成对飞行控制信号的逆向，就有可以对无人机实时劫持管控。Therefore, in order to strengthen the supervision of UAVs, necessary technical means are also needed, and the protocol must be reversed. Only by reversing its protocol format can the UAVs be captured technically and realize the detection of UAVs. Control. However, in the process of communication between the remote control and the UAV, the generated signal not only includes the signal of the controller to control the flight status of the UAV, because the UAV may need to perform certain tasks, such as aerial photography, plant protection, etc., Therefore, other types of signals will be generated, such as drone flight attitude signals, gps signals, etc. The most valuable part to us is the flight control signal. After completing the reverse of the flight control signal, it is possible to Human-machine real-time hijacking control.

2016年black大会上，Nils通过伪造遥控器信号对目标无人机实施了劫持，夺取了无人机的控制权；2016年3.15晚会上，腾讯公司安全团队通过对大疆精灵3s无人机遥控信号的伪造，夺取了目标无人机的控制权。此技术也可用于无人机管控上，一旦发现有无人机进入禁飞区，可以通过伪造无人机的控制信号夺取目标无人机的控制权，完成对其的管控。At the 2016 black conference, Nils hijacked the target drone by falsifying the remote control signal and seized control of the drone; at the 3.15 party in 2016, the security team of Tencent passed the remote control of the DJI Phantom 3s drone. The falsification of the signal took control of the target drone. This technology can also be used for drone control. Once it is found that a drone has entered the no-fly zone, the control of the target drone can be seized by forging the control signal of the drone to complete its control.

上述技术方案的实现都依赖于对硬件的拆解，研究硬件中的芯片，动态调试的方法获取飞控数据，但是这种方法有一定的局限性，例如新一代的大疆已经将遥控器中的芯片型号进行了处理，无法识别对芯片进行研究，并且强烈依赖于无人机的型号，需要针对每一种型号的无人机做特别的研究分析，费时费力。本发明主要针对现有技术的限制，提出了一种新型的无人机飞控数据的识别方法，该方法不依赖硬件拆解并且自动化程度高。The realization of the above technical solutions all rely on the disassembly of the hardware, the study of the chips in the hardware, and the dynamic debugging method to obtain the flight control data, but this method has certain limitations. The model of the chip has been processed, and the research on the chip cannot be identified, and it is strongly dependent on the model of the drone. It is necessary to do special research and analysis for each type of drone, which is time-consuming and labor-intensive. The present invention mainly aims at the limitation of the prior art, and proposes a novel identification method of UAV flight control data, which does not rely on hardware disassembly and has a high degree of automation.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于：为了解决传统方式的局限性、无法识别对芯片进行研究，并且强烈依赖于无人机的型号，需要针对每一种型号的无人机做特别的研究分析，费时费力的问题，提供一种无人机飞控数据的识别方法。The purpose of the present invention is: in order to solve the limitations of the traditional method, can not identify the research on the chip, and strongly depends on the model of the drone, it is necessary to do special research and analysis for each type of drone, which is time-consuming and labor-intensive. The problem is to provide a method for identifying UAV flight control data.

为实现上述目的，本发明提供如下技术方案：一种无人机飞控数据的识别方法，包括下列步骤：To achieve the above purpose, the present invention provides the following technical solutions: a method for identifying UAV flight control data, comprising the following steps:

步骤一：使无人机悬空，使用抓包工具抓取无人机-控制器之间的数据包，此时数据包是二进制数据流；Step 1: Hang the drone in the air, and use the packet capture tool to capture the data packets between the drone and the controller. At this time, the data packets are binary data streams;

步骤二：将二进制数据流转换成可显示的十六进制字符；Step 2: Convert the binary data stream into displayable hexadecimal characters;

步骤三：使用自然语言处理中n-gram模型，提取每个数据包的特征；Step 3: Use the n-gram model in natural language processing to extract the features of each data packet;

步骤四：使用k-means++算法使用步骤三中提取的特征向量对悬空状态下的无人机通信数据包聚类；Step 4: Use the k-means++ algorithm to use the feature vector extracted in step 3 to cluster the UAV communication data packets in the suspended state;

步骤五：对聚类的每一个类簇使用One-Class-SVM进行模型训练，得到多个悬空状态下的无人机通信数据模型；Step 5: Use One-Class-SVM for model training for each cluster of the cluster, and obtain multiple UAV communication data models in the suspended state;

步骤六：使用控制器无规则控制无人机飞行，使用抓包工具抓取无人机-控制器之间的数据包，将捕获的二进制数据流转换为可显示的十六进制字符串；Step 6: Use the controller to control the drone flight randomly, use the packet capture tool to capture the data packets between the drone and the controller, and convert the captured binary data stream into a displayable hexadecimal string;

步骤七：使用n-gram模型对无规则飞行状态下的每条通信数据提取特征向量；Step 7: Use the n-gram model to extract feature vectors for each communication data in the irregular flight state;

步骤八：将每条数据的特征向量依次通过步骤五中悬空状态下训练的One-Class-SVM模型中进行检测，如果和训练模型的类簇属于同一个类簇会被模型标记为1，反之标记为-1；Step 8: The feature vector of each piece of data is tested in the One-Class-SVM model trained in the suspended state in step 5. If it belongs to the same cluster as the training model, it will be marked as 1 by the model, otherwise marked as -1;

步骤九：由于无人机在人为控制状态下产生的数据不在悬空状态下的类簇中，会被所有的模型标记为-1，对全被标记为-1的数据进行筛选形成待选集合；Step 9: Since the data generated by the drone under human control is not in the cluster in the suspended state, it will be marked as -1 by all models, and the data marked as -1 will be screened to form a candidate set;

步骤十：使用Needleman-Wunsch算法计算待选集合中任意两条序列之间的相似性得分，形成一个得分矩阵；Step 10: Use the Needleman-Wunsch algorithm to calculate the similarity score between any two sequences in the candidate set to form a score matrix;

步骤十一：错误识别的序列和相似性得分矩阵中的其他序列的相似性得分比较低，根据分数大小去除错误识别的序列，最终得到飞控协议数据；Step 11: The similarity score of the wrongly identified sequence and other sequences in the similarity score matrix is relatively low, and the wrongly identified sequence is removed according to the score, and finally the flight control protocol data is obtained;

所述步骤三中的特征向量是通过下列方式提取出的：The feature vector in the third step is extracted in the following ways:

（一）分别令n=1,2,3,4,5,6 对数据包中的每条数据做n-gram分词；(1) Let n=1, 2, 3, 4, 5, and 6 respectively perform n-gram segmentation for each piece of data in the data packet;

（二）统计每个n值下的对应分词的频率；(2) Count the frequency of the corresponding word segmentation under each n value;

（三）对于每个n值下的所有分词出现的频率做从小到大的顺序排序，每个词出现的频率记为y,其排名顺序记为x，用回归分析去做(logy)=1/(logx)的拟合，计算拟合系数,选取拟合系数最大时的n；(3) Sort the frequency of occurrence of all word segments under each value of n in ascending order. The frequency of occurrence of each word is recorded as y, and its ranking order is recorded as x. Use regression analysis to do (logy)=1 /(logx) fitting, calculate the fitting coefficient, and select n when the fitting coefficient is the largest;

（四）利用上述方式计算出来的n值对数据包中每条数据做长度为n的分词，统计所有长度为n的子序列出现的频率，根据频率计算每种长度为n类型的子序列出现的百分比，将百分比组成一个向量，作为当前序列的特征向量。(4) Use the n value calculated by the above method to segment each piece of data in the data packet into a word of length n, count the frequency of occurrence of all subsequences of length n, and calculate the occurrence of each type of subsequence of length n according to the frequency The percentage of the percentage, the percentage into a vector, as the feature vector of the current sequence.

优选地，所述步骤八中模型是这样训练出来的：Preferably, the model in the eighth step is trained as follows:

（一）对完成聚类的每个簇，使用步骤三的特征提取方式提取特征向量；(1) For each cluster that has completed the clustering, use the feature extraction method of step 3 to extract the feature vector;

（二）选择One-Class-SVM的核函数为高斯核函数；(2) Select the kernel function of One-Class-SVM as the Gaussian kernel function;

（三）以簇为单位，每个簇每条数据的特征向量作为输入，使用高斯核函数将其映射到高维空间中；(3) Taking the cluster as the unit, the feature vector of each data in each cluster is used as the input, and the Gaussian kernel function is used to map it into the high-dimensional space;

（四）在高维空间中寻找一个高维球面，使得尽量多得把输入的数据包括在球内，并且球的半径尽量小。(4) Find a high-dimensional sphere in the high-dimensional space, so that the input data is included in the sphere as much as possible, and the radius of the sphere is as small as possible.

优选地，所述步骤十一中的错误排除是这样进行的：Preferably, the error elimination in the eleventh step is carried out as follows:

（一）筛选被所有模型标记为-1的序列；(a) Screening sequences marked as -1 by all models;

（二）使用Needleman-Wunsch算法计算任意两条筛选的序列之间的相似性得分，形成一个得分矩阵；(2) Use the Needleman-Wunsch algorithm to calculate the similarity score between any two selected sequences to form a score matrix;

（三）逐行扫描得分矩阵，如果当前行第n列的得分远远小于其他列的得分，考察第n行的得分，如果第n行的整体得分远远小于其他行，则第n位置的元素为错误识别元素，删除。(3) Scan the score matrix row by row. If the score of the nth column of the current row is much smaller than the scores of other columns, examine the score of the nth row. If the overall score of the nth row is much smaller than other rows, then the nth position of the score The element is a misidentified element, delete it.

与现有技术相比，本发明的有益效果是：本发明通过将n-gram模型应用到无人机通信数据的特征提取上、对悬空状态的无人机通信数据进行聚类，并对每一种类型的数据建立One-Class-SVM模型，以恶意流量检测的思想识别在控制器控制下的飞控数据，对于错误识别的排除，我们使用了Needleman-Wunsch算法计算任意两条被所有模型标记为-1的数据序列的相似性得分，形成一个得分矩阵，根据得分矩阵去排除错误识别的数据，从而识别飞控协议。现有技术主要依赖遥控器拆解，控制端软件逆向分析来识别飞控协议，费时费力，没有通用性，本发明适用范围广，依赖机器学习的方法，识别速度快，准确率高。Compared with the prior art, the beneficial effects of the present invention are as follows: the present invention performs clustering on the UAV communication data in the suspended state by applying the n-gram model to the feature extraction of the UAV communication data, and analyzes each A One-Class-SVM model is established for one type of data, and the flight control data under the control of the controller is identified with the idea of malicious traffic detection. For the exclusion of wrong identification, we use the Needleman-Wunsch algorithm to calculate any two models that are used by all models. The similarity scores of the data sequences marked as -1 form a score matrix, and according to the score matrix, the misidentified data are excluded, so as to identify the flight control protocol. The prior art mainly relies on the disassembly of the remote controller and reverse analysis of the control terminal software to identify the flight control protocol, which is time-consuming and labor-intensive, and has no generality.

附图说明Description of drawings

图1为本发明的系统流程图；Fig. 1 is the system flow chart of the present invention;

图2位本发明的特征提取流程图。Figure 2 is a flow chart of feature extraction of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

请参阅图1，一种无人机飞控数据的识别方法，包括下列步骤：Please refer to Figure 1, a method for identifying UAV flight control data, including the following steps:

步骤十一：错误识别的序列和相似性得分矩阵中的其他序列的相似性得分比较低，根据分数大小去除错误识别的序列，最终得到飞控协议数据。Step 11: The similarity score of the misidentified sequence and other sequences in the similarity score matrix is relatively low, and the misidentified sequence is removed according to the size of the score, and finally the flight control protocol data is obtained.

实施例1Example 1

作为本发明的优选实施例：步骤三中的特征向量是通过下列方式提取出的：As a preferred embodiment of the present invention: the feature vector in step 3 is extracted by the following methods:

（三）对于每个n值下的所有分词出现的频率做从小到大的顺序排序，每个词出现的频率记为y,其排名顺序记为x，用回归分析去做(logy)=1/(logx)的拟合，计算拟合系数,选取拟合系数最大时的n；(3) Sort the frequency of occurrence of all word segments under each value of n in ascending order, the frequency of occurrence of each word is recorded as y, the ranking order is recorded as x, and regression analysis is used to do (logy)=1 /(logx) fitting, calculate the fitting coefficient, and select n when the fitting coefficient is the largest;

（四）利用上述方式计算出来的n值对数据包中每条数据做长度为n的分词，统计所有长度为n的子序列出现的频率，根据频率计算每种长度为n类型的子序列出现的百分比，将百分比组成一个向量，作为当前序列的特征向量，便于快速、可靠地提取特征向量。(4) Use the n value calculated by the above method to segment each piece of data in the data packet into a word of length n, count the frequency of occurrence of all subsequences of length n, and calculate the occurrence of each type of subsequence of length n according to the frequency The percentage is composed of a vector, which is used as the feature vector of the current sequence, which is convenient to extract the feature vector quickly and reliably.

实施例2Example 2

作为本发明的优选实施例：所述步骤八中模型是这样训练出来的：As a preferred embodiment of the present invention: the model in the eighth step is trained as follows:

实施例3Example 3

作为本发明的优选实施例：所述步骤十一中的错误排除是这样进行的：As a preferred embodiment of the present invention: the error elimination in the eleventh step is carried out as follows:

对于本领域技术人员而言，显然本发明不限于上述示范性实施例的细节，而且在不背离本发明的精神或基本特征的情况下，能够以其他的具体形式实现本发明。因此，无论从哪一点来看，均应将实施例看作是示范性的，而且是非限制性的，本发明的范围由所附权利要求而不是上述说明限定，因此旨在将落在权利要求的等同要件的含义和范围内的所有变化囊括在本发明内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。It will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the invention is to be defined by the appended claims rather than the foregoing description, which are therefore intended to fall within the scope of the claims. All changes within the meaning and scope of the equivalents of , are included in the present invention. Any reference signs in the claims shall not be construed as limiting the involved claim.

Claims

1. an identification method of unmanned aerial vehicle flight control data is characterized in that: comprise the following steps:

Step 1: Hang the drone in the air, and use the packet capture tool to capture the data packets between the drone and the controller. At this time, the data packets are binary data streams;

Step 2: Convert the binary data stream into displayable hexadecimal characters;

Step 3: Use the n-gram model in natural language processing to extract the features of each data packet;

Step 4: Use the k-means++ algorithm to use the feature vector extracted in step 3 to cluster the UAV communication data packets under static conditions;

Step 5: Use One-Class_SVM for model training for each cluster of the cluster, and obtain multiple UAV communication data models in the suspended state;

Step 6: Use the controller to control the drone flight randomly, use the packet capture tool to capture the data packets between the drone and the controller, and convert the captured binary data stream into a displayable hexadecimal string;

Step 7: Use the n-gram model to extract feature vectors for each communication data in the irregular flight state;

Step 8: The feature vector of each piece of data is tested in the One-Class-SVM model trained in the suspended state in step 5. If it belongs to the same cluster as the training model, it will be marked as 1 by the model, otherwise marked as -1;

Step 9: Since the data generated by the drone under human control is not in the cluster in the suspended state, it will be marked as -1 by all models, and the data marked as -1 will be screened to form a candidate set;

Step 10: Use the Needleman-Wunsch algorithm to calculate the similarity score between any two sequences in the candidate set to form a score matrix;

Step 11: The similarity score of the wrongly identified sequence and other sequences in the similarity score matrix is relatively low, and the wrongly identified sequence is removed according to the score, and finally the flight control protocol data is obtained;

The feature vector in the third step is extracted in the following ways:

(1) Let n=1, 2, 3, 4, 5, and 6 respectively perform n-gram segmentation for each piece of data in the data packet;

(2) Count the frequency of the corresponding word segmentation under each n value;

(3) Sort the frequency of occurrence of all word segments under each value of n in ascending order, the frequency of occurrence of each word is recorded as y, the ranking order is recorded as x, and regression analysis is used to do (logy)=1 /(logx) fitting, calculate the fitting coefficient, and select n when the fitting coefficient is the largest;

(4) Use the n value calculated in the above method to segment each piece of data in the data packet into a word of length n, count the frequency of occurrence of all subsequences of length n, and calculate each type of subsequence of length n according to the frequency The percentage of occurrences, and the percentages are formed into a vector as the feature vector of the current sequence.

2. the identification method of a kind of UAV flight control data according to claim 1, is characterized in that: the model training in described step 8 is trained by following way:

(1) Cluster the UAV data in the suspended state and divide it into n clusters;

(2) For each cluster, use the same feature extraction method used in clustering to extract feature vectors;

(3) Input the feature vector of each cluster into the One-Class-SVM model for training, and obtain the high-dimensional space sphere of the normal data of each model. The data located inside the sphere is marked as 1, and the external data is marked as -1.

3. the identification method of a kind of UAV flight control data according to claim 1 is characterized in that: the abnormal point of the wrong identification in the described step eleven is eliminated by the following means:

Use the Needleman-Wunsch algorithm to calculate the scores between any two sequences marked with -1 by all models, forming a matrix;

The similarity score between misidentified sequences and correctly identified sequences is significantly smaller than that between normal sequences, and outliers are excluded based on this difference.