CN102014025B

CN102014025B - Method for detecting P2P botnet structure based on network flow clustering

Info

Publication number: CN102014025B
Application number: CN201010573650A
Authority: CN
Inventors: 夏春和; 段俊锋; 姚珊; 王海泉; 冯杰
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2010-12-06
Filing date: 2010-12-06
Publication date: 2012-09-05
Anticipated expiration: 2030-12-06
Also published as: CN102014025A

Abstract

The invention discloses a method for detecting a P2P botnet structure based on network flow clustering. The method uses a real-time communication data acquisition module, a datagram record filter module, a network flow extraction module, a network flow record filter module, and a network flow clustering module. , data association and result display modules have completed the detection of the P2P botnet structure. The basic idea is that the defender uses the regularity of the command and control communication between the nodes of the P2P botnet, that is, the duration, the number of datagrams, the byte Quantity, etc. have characteristics, by identifying the command and control communication in the monitored network communication data to determine the P2P botnet nodes in the monitored network and the command and control relationship between nodes, and then give the P2P botnet structure; the main purpose of the present invention The innovation point is that the communication network flows with similar characteristics are gathered together by the clustering method, and compared with the characteristics of the command and control communication feature set, the normal communication and the P2P botnet communication are distinguished, so as to achieve the purpose of detecting the structure of the P2P botnet.

Description

A Method of Detecting P2P Botnet Structure Based on Network Flow Clustering

技术领域 technical field

本发明涉及一种发现网络结构的方法，更特别地说，是指一种基于网络流聚类检测P2P僵尸网络结构的方法。The invention relates to a method for discovering network structure, more particularly, a method for detecting P2P botnet structure based on network flow clustering.

背景技术 Background technique

僵尸网络(Botnets)是大量未经授权控制计算机资源，能够接受远程控制命令执行相应操作的计算机程序所组成的网络。它是一种从传统恶意代码形态进化而来的新型攻击方式，为攻击者提供了隐匿、灵活且高效的一对多命令与控制机制，可以控制大量僵尸主机实现信息窃取、分布式拒绝服务攻击和垃圾邮件发送等攻击目的；其中，一对多的命令与控制机制是它的本质特征。攻击者是指通过使用计算机网络采取行动扰乱、阻止、削弱或毁坏驻留在计算机及其计算机网络上的信息或计算机及其网络自身的计算机或人。Botnets are a network composed of a large number of computer programs that control computer resources without authorization and can accept remote control commands to perform corresponding operations. It is a new attack method evolved from the form of traditional malicious code. It provides attackers with a hidden, flexible and efficient one-to-many command and control mechanism, which can control a large number of zombie hosts to achieve information theft and distributed denial of service attacks. and spam sending and other attack purposes; among them, the one-to-many command and control mechanism is its essential feature. An attacker is a computer or person who, by using a computer network, takes actions to disrupt, block, impair or destroy information residing on a computer and its computer network, or the computer and its network itself.

防御者是指在计算机网络及其信息系统内，采取一系列行动保护、监视、分析、检测和响应未经授权活动的人。A defender is someone who takes a range of actions to protect, monitor, analyze, detect and respond to unauthorized activity within a computer network and its information systems.

僵尸网络的命令与控制机制有多种模式：集中式、P2P模式和随机模式。P2P僵尸网络的特点是节点(僵尸进程)之间是一种对等关系，网络中不存在典型的命令与控制服务器，网络中的servent节点既可以作为客户端，又可以作为服务器端。相对于集中式的僵尸网络，P2P僵尸网络不容易被检测发现，具有更强的隐蔽性。The command and control mechanism of the botnet has various modes: centralized, P2P mode and random mode. The characteristic of the P2P botnet is that there is a peer-to-peer relationship between nodes (zombie processes), there is no typical command and control server in the network, and the servent nodes in the network can serve as both clients and servers. Compared with centralized botnets, P2P botnets are not easy to be detected and have stronger concealment.

参见图1所示，图中包括有A节点、B节点、C节点、D节点、E节点、F节点、G节点、H节点、和I节点共计9个节点，A～I节点之间通过命令与控制活动C&C构成P2P僵尸网络结构。攻击者可以通过预先设定的某些节点来传递攻击指令AI_n，P2P僵尸网络中的节点则可以通过命令与控制活动C&C将攻击指令AI_n传播至所述网络中的每一个节点；网络中的所有节点将解释所述的攻击指令AI_n并执行相应的攻击活动AA，所述的攻击活动AA将对被攻击者的网络、信息系统等造成损害。Referring to Figure 1, there are 9 nodes in total including A node, B node, C node, D node, E node, F node, G node, H node, and I node. Together with the control activity C&C, it forms a P2P botnet structure. The attacker can transmit the attack command AI _n through some preset nodes, and the nodes in the P2P botnet can spread the attack command AI _n to each node in the network through the command and control activity C&C; All nodes of will interpret the attack instruction AI _n and execute the corresponding attack activity AA, and the attack activity AA will cause damage to the network and information system of the attacked party.

发明内容Contents of the invention

本发明的目的是提出一种基于网络流聚类检测P2P僵尸网络结构的方法，该方法通过实时通信数据采集模块、数据报记录过滤模块、网络流抽取模块、网络流记录过滤模块、网络流聚类模块、数据关联和结果显示模块的顺序执行完成了对P2P僵尸网络结构的检测，其基本思想是防御者利用P2P僵尸网络节点间命令与控制通信的规律性，即持续时间、数据报数量、字节数量等具有特征，通过识别被监控网络通信数据中的命令与控制通信来确定被监控网络中的P2P僵尸网络节点和节点间的命令与控制关系，进而给出P2P僵尸网络结构；本发明的主要创新点在于通过聚类方法将具有相似特征的通信网络流聚集在一起，与命令与控制通信特征集中的特征对比，区别正常通信与P2P僵尸网络通信，从而达到检测P2P僵尸网络结构的目的。The purpose of the present invention is to propose a method for detecting P2P botnet structure based on network flow clustering. The sequential execution of the class module, data association and result display module completes the detection of the structure of the P2P botnet. The number of bytes has characteristics, by identifying the command and control communication in the monitored network communication data to determine the P2P botnet nodes in the monitored network and the command and control relationship between nodes, and then give the P2P botnet structure; the present invention The main innovation of this technology is that the communication network flows with similar characteristics are gathered together by clustering method, compared with the characteristics of the command and control communication feature set, and the normal communication is distinguished from the P2P botnet communication, so as to achieve the purpose of detecting the P2P botnet structure .

本发明的一种基于网络流聚类检测P2P僵尸网络结构的方法，该方法包括有下列检测步骤：A kind of method of detecting P2P botnet structure based on network flow clustering of the present invention, this method comprises following detection steps:

步骤1：采集实时通信数据Step 1: Collect real-time communication data

实时通信数据采集模块首先从被监控网络中获取该被监控网络的IP数据报IPD，并从所述的IP数据报IPD中提取出关键字段KF＝{SIP，DIP，SPT，DPT，IHL，ITL，THL，PTL}；然后记录下当前采集IP数据报IPD的采集时间T_t；最后将所述的关键字段KF＝{SIP，DIP，SPT，DPT，IHL，ITL，THL，PTL}中的源IP地址SIP、目的IP地址DIP、源端口号SPT、目的端口号DPT、IP数据报协议字段类型PTL，以及采集时间T_t、应用层报文长度AML表示为一条数据报记录PR存储于数据报记录表PRT中；所述数据报记录PR按照数学中的元组形式表示为PR＝(SIP，DIP，SPT，DPT，PTL，T_t，AML)；The real-time communication data acquisition module first obtains the IP datagram IPD of the monitored network from the monitored network, and extracts the key field KF={SIP, DIP, SPT, DPT, IHL, ITL, THL, PTL}; then record the acquisition time T _t of the current acquisition IP datagram IPD; finally in the key field KF={SIP, DIP, SPT, DPT, IHL, ITL, THL, PTL} The source IP address SIP, destination IP address DIP, source port number SPT, destination port number DPT, IP datagram protocol field type PTL, as well as collection time T _t and application layer message length AML are expressed as a datagram record PR stored in In the datagram record table PRT; the datagram record PR is expressed as PR=(SIP, DIP, SPT, DPT, PTL, T _t , AML) according to the tuple form in mathematics;

步骤2：过滤数据报记录Step 2: Filter datagram records

数据报记录过滤模块根据第一过滤规则集FFR过滤掉所述数据报记录表PRT中的数据报记录PR；The datagram record filtering module filters out the datagram record PR in the datagram record table PRT according to the first filter rule set FFR;

步骤3：抽取网络流Step 3: Extract the network stream

网络流抽取模块首先接受防御者输入的超时时间间隔TO；然后按照网络流抽取策略FEP，根据所述采集时间T_t的先后顺序处理所述数据报记录表PRT中的数据报记录PR；The network flow extraction module first accepts the timeout interval TO input by the defender; then according to the network flow extraction strategy FEP, the datagram record PR in the datagram record table PRT is processed according to the order of the collection time T _t ;

步骤4：过滤网络流记录Step 4: Filter Network Flow Records

网络流记录过滤模块根据第二过滤规则集SFR过滤掉无关的网络流记录；The network flow record filtering module filters out irrelevant network flow records according to the second filtering rule set SFR;

步骤5：进行网络流聚类Step 5: Perform network flow clustering

网络流聚类模块首先接受防御者输入的命令与控制特征集CCFFT；然后利用最大最小值法MM对网络流记录表FRT中的网络流记录FR的网络流特征FFT进行数据规格化；最后对进行网络流记录表FRT中的网络流记录FR进行聚类，将聚类中心点接近所述命令与控制特征集CCFFT中特征的一个或者多个聚类作为P2P僵尸网络命令与控制网络流集CCS；The network flow clustering module first accepts the command and control feature set CCFFT input by the defender; then uses the maximum and minimum method MM to normalize the data of the network flow feature FFT of the network flow record FR in the network flow record table FRT; The network flow records FR in the network flow record table FRT are clustered, and one or more clusters whose cluster centers are close to the features in the command and control feature set CCFFT are used as the P2P botnet command and control network flow set CCS;

步骤6：显示结果Step 6: Display the Results

数据关联和结果显示模块首先提取所述命令与控制网络流集CCS中的IP地址集IPS；然后将IP地址集IPS中的每一个IP地址表示为一个点，在所述命令与控制网络流集CCS中每一个网络流记录FR对应的源IP地址SIP、目的IP地址DIP之间绘制一条边；由得到的点和边构成了防御者检测到的P2P僵尸网络结构。The data association and result display module first extracts the IP address set IPS in the command and control network flow set CCS; then represents each IP address in the IP address set IPS as a point, and in the command and control network flow set Draw an edge between the source IP address SIP and the destination IP address DIP corresponding to each network flow record FR in the CCS; the obtained points and edges constitute the P2P botnet structure detected by the defender.

附图说明 Description of drawings

图1是常规的P2P僵尸网络结构示意图。Figure 1 is a schematic diagram of a conventional P2P botnet structure.

图2是本发明检测P2P僵尸网络结构的原理图。Fig. 2 is a schematic diagram of the present invention detecting P2P botnet structure.

具体实施方式 Detailed ways

下面将结合附图和实施例对本发明做进一步的详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

步骤1：采集实时通信数据Step 1: Collect real-time communication data

实时通信数据采集模块第一方面从被监控网络中获取该被监控网络的IP数据报IPD，并从所述的IP数据报IPD中提取出关键字段KF；The first aspect of the real-time communication data acquisition module obtains the IP datagram IPD of the monitored network from the monitored network, and extracts the key field KF from the IP datagram IPD;

所述的关键字段KF中包括有源IP地址SIP、目的IP地址DIP、源端口号SPT、目的端口号DPT、IP头部长度IHL、IP数据报总长度ITL、TCP/UDP头部长度THL、IP数据报协议字段类型PTL；按照数学中的集合表达形式为KF＝{SIP，DIP，SPT，DPT，IHL，ITL，THL，PTL}。The key field KF includes source IP address SIP, destination IP address DIP, source port number SPT, destination port number DPT, IP header length IHL, IP datagram total length ITL, TCP/UDP header length THL , IP datagram protocol field type PTL; according to the set expression form in mathematics, it is KF={SIP, DIP, SPT, DPT, IHL, ITL, THL, PTL}.

在本发明中，当PTL相同于TCP协议时，利用ITL-IHL-THL来计算应用层报文长度AML。In the present invention, when the PTL is the same as the TCP protocol, ITL-IHL-THL is used to calculate the application layer message length AML.

在本发明中，当PTL相同于UDP协议时，利用ITL-IHL-THL+8来计算应用层报文长度AML。In the present invention, when the PTL is the same as the UDP protocol, ITL-IHL-THL+8 is used to calculate the application layer message length AML.

TCP(Transmission Control Protocol)，传输控制协议。TCP能够在不可靠的互联网络上提供一个可靠的端到端字节流通信。TCP (Transmission Control Protocol), transmission control protocol. TCP can provide a reliable end-to-end byte stream communication over an unreliable Internet network.

UDP(User Datagram Protocol)，用户数据报协议。UDP能够为应用程序提供了一种方法来发送经过封装的IP数据报，而且不必建立连接就可以发送这些IP数据报。UDP (User Datagram Protocol), User Datagram Protocol. UDP can provide a method for applications to send encapsulated IP datagrams, and these IP datagrams can be sent without establishing a connection.

实时通信数据采集模块第二方面，记录下当前采集IP数据报IPD的时间信息，记为T_t(简称为采集时间T_t)；The second aspect of the real-time communication data acquisition module records the time information of the current acquisition IP datagram IPD, which is recorded as T _t (abbreviated as acquisition time T _t );

实时通信数据采集模块第三方面，将所述的关键字段KF中的源IP地址SIP、目的IP地址DIP、源端口号SPT、目的端口号DPT、IP数据报协议字段类型PTL，以及采集时间T_t、应用层报文长度AML表示为一条数据报记录PR存储于数据报记录表PRT中；所述数据报记录PR按照数学中的元组形式表示为PR＝(SIP，DIP，SPT，DPT，PTL，T_t，AML)。In the third aspect of the real-time communication data collection module, source IP address SIP, destination IP address DIP, source port number SPT, destination port number DPT, IP datagram protocol field type PTL in the key field KF, and collection time T _t , application layer message length AML is represented as a datagram record PR stored in the datagram record table PRT; said datagram record PR is expressed as PR=(SIP, DIP, SPT, DPT according to the tuple form in mathematics , PTL, T _t , AML).

步骤2：过滤数据报记录Step 2: Filter datagram records

数据报记录过滤模块根据第一过滤规则集FFR过滤掉所述数据报记录表PRT中的数据报记录PR。The datagram record filtering module filters out the datagram records PR in the datagram record table PRT according to the first filtering rule set FFR.

在本发明中，所述的第一过滤规则集FFR第一方面包括协议类型过滤规则PFR；数据报记录过滤模块删除数据报记录表PRT中协议字段类型PTL不同于所述协议类型过滤规则PFR的数据报记录PR；In the present invention, the first aspect of the first filter rule set FFR includes the protocol type filter rule PFR; the datagram record filter module deletes the protocol field type PTL in the datagram record table PRT that is different from the protocol type filter rule PFR datagram record PR;

在本发明中，所述的第一过滤规则集FFR第二方面包括白名单过滤规则WLFR；所述白名单过滤规则WLFR包含防御者信任的IP地址；数据报记录过滤模块删除数据报记录表PRT中源IP地址SIP或目的IP地址DIP相同于所述白名单过滤规则WLFR的数据报记录PR；In the present invention, the second aspect of the first filter rule set FFR includes a whitelist filter rule WLFR; the whitelist filter rule WLFR includes the IP address trusted by the defender; the datagram record filter module deletes the datagram record table PRT The source IP address SIP or the destination IP address DIP is the same as the datagram record PR of the white list filter rule WLFR;

在本发明中，所述的第一过滤规则集FFR第三方面包括黑名单过滤规则BLFR；所述的黑名单过滤规则BLFR包含防御者怀疑的IP地址；数据报记录过滤模块删除数据报记录表PRT中源IP地址SIP或目的IP地址DIP不同于所述黑名单过滤规则BLFR的数据报记录PR。In the present invention, the third aspect of the first filter rule set FFR includes a blacklist filter rule BLFR; the blacklist filter rule BLFR includes the IP address suspected by the defender; the datagram record filter module deletes the datagram record table In the PRT, the source IP address SIP or the destination IP address DIP is different from the datagram record PR of the blacklist filtering rule BLFR.

步骤3：抽取网络流Step 3: Extract the network stream

网络流记录FR包括源IP地址SIP、目的IP地址DIP、源端口号SPT、目的端口号DPT、协议类型PTL、开始时间ST、结束时间ET、数据报数量PN、字节数量BN、网络流持续时间DRT、平均每个数据报字节数量BPP、平均每秒钟数据报数量PPS、平均每秒钟字节数量BPS；将所述网络流记录FR按照数学中的元组形式表示为FR＝(SIP，DIP，SPT，DPT，PTL，ST，ET，PN，BN，DRT，BPP，PPS，BPS)；所述网络流记录FR存储于网络流记录表FRT中。Network flow record FR includes source IP address SIP, destination IP address DIP, source port number SPT, destination port number DPT, protocol type PTL, start time ST, end time ET, datagram number PN, byte number BN, network flow duration Time DRT, the average number of bytes per datagram BPP, the average number of datagrams per second PPS, the average number of bytes per second BPS; the network flow record FR is expressed as FR=( SIP, DIP, SPT, DPT, PTL, ST, ET, PN, BN, DRT, BPP, PPS, BPS); the network flow record FR is stored in the network flow record table FRT.

五元组FT包括源IP地址SIP、目的IP地址DIP、源端口号SPT、目的端口号DPT、IP数据报协议字段类型PTL所组成的元组；将所述五元组FT按照数学中的元组形式表示为FT＝(SIP，DIP，SPT，DPT，PTL)。The quintuple FT comprises a tuple composed of source IP address SIP, destination IP address DIP, source port number SPT, destination port number DPT, IP datagram protocol field type PTL; The group form is expressed as FT=(SIP, DIP, SPT, DPT, PTL).

网络流特征FFT包括网络流持续时间DRT、平均每个数据报字节数量BPP、平均每秒钟数据报数量PPS、平均每秒钟字节数量BPS；将所述网络流特征FFT按照数学中的元组形式表示为FFT＝(DRT，BPP，PPS，BPS)。The network flow characteristic FFT comprises the network flow duration DRT, the average number of bytes per datagram BPP, the average number of datagrams per second PPS, and the average number of bytes per second BPS; The tuple form is expressed as FFT=(DRT, BPP, PPS, BPS).

所述网络流特征FFT按照网络流特征计算策略FFTCP计算。The network flow characteristic FFT is calculated according to the network flow characteristic calculation policy FFTCP.

所述网络流特征计算策略FFTCP利用ST-ET计算所述网络流持续时间DRT，利用BN/PN计算所述平均每个数据报字节数量BPP，利用PN/DRT计算所述平均每秒钟数据报数量PPS，利用BN/DRT计算所述平均每秒钟字节数量BPS。The network flow feature calculation strategy FFTCP uses ST-ET to calculate the network flow duration DRT, uses BN/PN to calculate the average number of bytes per datagram BPP, and uses PN/DRT to calculate the average data per second report the number of PPS, and use BN/DRT to calculate the average number of bytes per second BPS.

网络流抽取模块第一方面接受防御者输入的超时时间间隔，记为TO(简称超时时间TO)；The first aspect of the network flow extraction module accepts the timeout interval input by the defender, which is recorded as TO (referred to as the timeout time TO);

网络流抽取模块第二方面按照网络流抽取策略FEP，根据所述采集时间T_t的先后顺序处理所述数据报记录表PRT中的数据报记录PR。The second aspect of the network flow extraction module is to process the datagram records PR in the datagram record table PRT according to the sequence of the collection time T _t according to the network flow extraction policy FEP.

在本发明中，所述的网络流抽取策略FEP第一方面从所述网络流记录表FRT中查找五元组FT相同于所述数据报记录PR的五元组FT的开始时间ST最大的网络流记录FR；In the present invention, the first aspect of the network flow extraction strategy FEP searches the network flow record table FRT for the network whose quintuple FT is the same as the start time ST of the quintuple FT of the datagram record PR. flow record FR;

如果存在所述网络流记录FR，并且所述数据报记录PR的采集时间T_t与所述网络流记录FR的开始时间ST满足T_t-ST小于等于所述超时时间TO，则根据数据报记录PR更新网络流记录FR：所述网络流记录FR的结束时间ET等于所述数据报记录PR的采集时间T_t；所述网络流记录FR的数据报数量PN等于当前值加上1；所述网络流记录FR的字节数量BN等于当前值加上所述数据报记录PR的应用层报文长度AML；按照所述网络流特征计算策略FFTCP重新计算所述网络流持续时间DRT、平均每个数据报字节数量BPP、平均每秒钟数据报数量PPS、平均每秒钟字节数量BPS。If there is the network flow record FR, and the collection time T _t of the datagram record PR and the start time ST of the network flow record FR satisfy that T _t -ST is less than or equal to the timeout time TO, then according to the datagram record PR updates the network flow record FR: the end time ET of the network flow record FR is equal to the collection time T _t of the datagram record PR; the datagram quantity PN of the network flow record FR is equal to the current value plus 1; the The number of bytes BN of the network flow record FR is equal to the current value plus the application layer message length AML of the datagram record PR; recalculate the network flow duration DRT, average each The number of datagram bytes BPP, the average number of datagrams per second PPS, the average number of bytes per second BPS.

如果存在所述网络流记录FR，并且所述数据报记录PR的采集时间T_t与所述网络流记录FR的开始时间ST满足T_t-ST大于所述超时时间TO，则向所述网络流记录表FRT中插入一条新的网络流记录FR；所述网络流记录FR的开始时间ST等于所述数据报记录PR的采集时间T_t；所述网络流记录FR的数据报数量PN等于1；所述网络流记录FR的字节数量BN等于所述数据报记录PR的应用层报文长度AML；按照所述网络流特征计算策略FFTCP计算所述网络流持续时间DRT、平均每个数据报字节数量BPP、平均每秒钟数据报数量PPS、平均每秒钟字节数量BPS。If there is the network flow record FR, and the acquisition time T _t of the datagram record PR and the start time ST of the network flow record FR satisfy that T _t -ST is greater than the timeout time TO, then the network flow A new network flow record FR is inserted in the record table FRT; the start time ST of the network flow record FR is equal to the collection time T _t of the datagram record PR; the datagram quantity PN of the network flow record FR is equal to 1; The number of bytes BN of the network flow record FR equals the application layer message length AML of the datagram record PR; calculate the network flow duration DRT according to the network flow characteristic calculation strategy FFTCP, and average each datagram word The number of sections BPP, the average number of datagrams per second PPS, and the average number of bytes per second BPS.

如果不存在所述网络流记录FR，则向所述网络流记录表FRT中插入一条新的网络流记录FR；所述网络流记录FR的开始时间ST等于所述数据报记录PR的采集时间T_t；所述网络流记录FR的数据报数量PN等于1；所述网络流记录FR的字节数量BN等于所述数据报记录PR的应用层报文长度AML；按照所述网络流特征计算策略FFTCP计算所述网络流持续时间DRT、平均每个数据报字节数量BPP、平均每秒钟数据报数量PPS、平均每秒钟字节数量BPS。If the network flow record FR does not exist, insert a new network flow record FR into the network flow record table FRT; the start time ST of the network flow record FR is equal to the collection time T of the datagram record PR _t ; the datagram quantity PN of the network flow record FR is equal to 1; the byte quantity BN of the network flow record FR is equal to the application layer message length AML of the datagram record PR; according to the network flow characteristic calculation strategy FFTCP calculates the network flow duration DRT, the average number of bytes per datagram BPP, the average number of datagrams per second PPS, and the average number of bytes per second BPS.

步骤4：过滤网络流记录Step 4: Filter Network Flow Records

网络流记录过滤模块根据第二过滤规则集SFR过滤掉无关的网络流记录。The network flow record filtering module filters out irrelevant network flow records according to the second filtering rule set SFR.

在本发明中，所述的第二过滤规则集SFR第一方面包括特殊通信过滤规则SCFR；所述特殊通信过滤规则SCFR包括数据报数量PN为1、字节数量BN为0；网络流记录过滤模块删除网络流记录表FRT中网络流记录FR相同于与所述特殊通信过滤规则SCFR的网络流记录FR；In the present invention, the first aspect of the second filtering rule set SFR includes a special communication filtering rule SCFR; the special communication filtering rule SCFR includes a datagram number PN of 1 and a byte number BN of 0; network flow record filtering The module deletes the network flow record FR in the network flow record table FRT that is the same as the network flow record FR of the special communication filtering rule SCFR;

在本发明中，所述的第二过滤规则集SFR第二方面包括P2P通信过滤规则PPFR；所述的P2P通信过滤规则PPFR包含所述网络流特征FFT为某一特定值；网络流记录过滤模块删除网络流记录表FRT中网络流特征FFT相同于所述P2P通信过滤规则PPFR的网络流记录FR。In the present invention, the second aspect of the second filter rule set SFR includes P2P communication filter rule PPFR; the P2P communication filter rule PPFR includes the network flow feature FFT as a specific value; the network flow record filter module Delete the network flow record FR whose network flow characteristic FFT is the same as the P2P communication filtering rule PPFR in the network flow record table FRT.

步骤5：进行网络流聚类Step 5: Perform network flow clustering

网络流聚类模块第一方面接受防御者输入的命令与控制网络流特征集，记为CCFFT(简称为命令与控制特征集CCFFT)；The first aspect of the network flow clustering module accepts the command and control network flow feature set input by the defender, which is recorded as CCFFT (referred to as the command and control feature set CCFFT);

网络流聚类模块第二方面利用最大最小值法MM对网络流记录表FRT中的网络流记录FR的网络流特征FFT进行数据规格化；The second aspect of the network flow clustering module uses the maximum and minimum method MM to normalize the data of the network flow feature FFT of the network flow record FR in the network flow record table FRT;

所述最大最小值法MM是取数据集中最小值MIN，数据集中最大值MAX，然后数据集中的每一个数据D等于D-MIN除以MAX-MIN。The maximum and minimum value method MM is to take the minimum value MIN in the data set, the maximum value MAX in the data set, and then each data D in the data set is equal to D-MIN divided by MAX-MIN.

网络流聚类模块第三方面对进行网络流记录表FRT中的网络流记录FR进行聚类，将聚类中心点接近所述命令与控制特征集CCFFT中特征的一个或者多个聚类作为P2P僵尸网络命令与控制网络流集CCS。The third party of the network flow clustering module clusters the network flow records FR in the network flow record table FRT, and uses one or more clusters whose cluster center points are close to the features in the command and control feature set CCFFT as P2P Botnet command and control network flow set CCS.

步骤6：显示结果Step 6: Display the Results

数据关联和结果显示模块第一方面提取所述命令与控制网络流集CCS中的源IP地址SIP、目的IP地址DIP，记为命令与控制IP地址集IPS(简称IP地址集IPS)；The first aspect of the data association and result display module extracts the source IP address SIP and the destination IP address DIP in the command and control network flow set CCS, which is recorded as the command and control IP address set IPS (IP address set IPS for short);

数据关联和结果显示模块第二方面将IP地址集IPS中的每一个IP地址表示为一个点，在所述命令与控制网络流集CCS中每一个网络流记录FR对应的源IP地址SIP、目的IP地址DIP之间绘制一条边；由得到的点和边构成了防御者检测到的P2P僵尸网络结构。In the second aspect of the data association and result display module, each IP address in the IP address set IPS is represented as a point, and the source IP address SIP and destination IP address corresponding to each network flow record FR in the command and control network flow set CCS An edge is drawn between the IP addresses DIP; the resulting points and edges constitute the P2P botnet structure detected by the defender.

本发明基于网络流聚类检测P2P僵尸网络结构方法的优点在于：The present invention has the advantage of detecting the P2P botnet structure method based on network flow clustering:

①利用P2P僵尸网络的特点，将网络中的节点作为监控节点(watchlist)，或者作为待分析数据的过滤规则，减少了待处理的数据量。①Using the characteristics of P2P botnets, the nodes in the network are used as monitoring nodes (watchlist), or as filtering rules for data to be analyzed, reducing the amount of data to be processed.

②采用网络流描绘方法，提高了基于网络流聚类检测P2P僵尸网络结构的方法的适用性，因此既可以处理基于TCP协议的命令与控制通信，也可以处理基于UDP协议的命令与控制通信。② Using the network flow depiction method, the applicability of the method based on network flow clustering to detect the P2P botnet structure is improved, so it can handle both the command and control communication based on the TCP protocol and the command and control communication based on the UDP protocol.

③本发明的方法将网络数据包过滤和网络流过滤相结合。即首先进行网络数据包的过滤，缩减抽取网络流及其特征活动的输入数据，以提高这一活动的效率。然后再进行网络流过滤，缩减网络流的数据规模，提高网络流聚类分析的效率。③ The method of the present invention combines network packet filtering and network flow filtering. That is to filter the network data packets first, and reduce the input data for extracting the network flow and its characteristic activities, so as to improve the efficiency of this activity. Then perform network flow filtering to reduce the data size of network flows and improve the efficiency of network flow clustering analysis.

④本发明方法检测给出P2P僵尸网络结构，能够促进网络防御者对P2P僵尸网络的工作机制的理解和认识，进而提出更加有效的防御措施。④ The method of the present invention detects and gives the structure of the P2P botnet, which can promote the network defender's understanding and understanding of the working mechanism of the P2P botnet, and then propose more effective defense measures.

Claims

1. a Basing on network fluid cluster detects the method for P2P Botnet structure, it is characterized in that this method includes following detection step:

Step 1: gather real-time communication data

The real-time communication data acquisition module at first obtains the IP datagram IPD of this monitored network from monitored network, and from described IP datagram IPD, extracts critical field KF={SIP, DIP; SPT, DPT, IHL; ITL; THL, PTL}, IP head length IHL, IP datagram total length ITL, TCP/UDP head length THL; Note the acquisition time Tt of current collection IP datagram IPD then; At last with described critical field KF={SIP, DIP, SPT; DPT; IHL, ITL, THL; Source IP address SIP among the PTL}, purpose IP address D IP, source port number SPT, destination slogan DPT, IP datagram protocol fields type PTL, and acquisition time Tt, application layer message length AML are expressed as a datagram record PR and are stored among the datagram record sheet PRT; Said datagram record PR according to the tuple form in the mathematics be expressed as PR=(SIP, DIP, SPT, DPT, PTL, Tt, AML);

Step 2: filtering data newspaper record

Datagram record filtering module filters out the datagram record PR among the said datagram record sheet PRT according to the first filtering rule set FFR;

The described first filtering rule set FFR first aspect comprises protocol type filtering rule PFR; Protocol fields type PTL is different from the datagram record PR of said protocol type filtering rule PFR among the datagram record filtering module deleted data newspaper record sheet PRT; Second aspect comprises white list filtering rule WLFR; Said white list filtering rule WLFR comprises the IP address that the defender trusts; Source IP address SIP or purpose IP address D IP are same as the datagram record PR of said white list filtering rule WLFR among the datagram record filtering module deleted data newspaper record sheet PRT; The third aspect comprises blacklist filtering rule BLFR; Described blacklist filtering rule BLFR comprises the IP address that the defender suspects; Source IP address SIP or purpose IP address D IP are different from the datagram record PR of said blacklist filtering rule BLFR among the datagram record filtering module deleted data newspaper record sheet PRT;

Step 3: extract network flow

The network flow abstraction module is at first accepted the timeout interval TO of defender's input; Extract tactful FEP according to network flow then, according to the datagram record PR among the sequencing processing said data newspaper record sheet PRT of said acquisition time Tt;

Step 4: screen stream record

Network flow record filtering module filters out irrelevant network flow record according to the second filtering rule set SFR; The described second filtering rule set SFR first aspect comprises specific communication filtering rule SCFR; Said specific communication filtering rule SCFR comprises that datagram quantity PN is 1, bytes in BN is 0; Network flow record FR is same as the network flow record FR of said specific communication filtering rule SCFR among the network flow record filtering module deletion network flow record sheet FRT; Second aspect comprises P2P communication filtering rule PPFR; It is a certain particular value that described P2P communication filtering rule PPFR comprises said network flow characteristic FFT; Network flow characteristic FFT is same as the network flow record FR of said P2P communication filtering rule PPFR among the network flow record filtering module deletion network flow record sheet FRT;

Step 5: carry out the network flow cluster

Network flow cluster module is at first accepted the order and the controlling features collection CCFFT of defender's input; Utilize maximin method MM that the network flow characteristic FFT of the record of the network flow among network flow record sheet FRT FR carry out data requirementization then; At last the record of the network flow among network flow record sheet FRT FR is carried out cluster, cluster centre point is ordered and Control Network adfluxion CCS as the P2P Botnet near one or more cluster of characteristic among said order and the controlling features collection CCFFT; Said maximin method MM is the concentrated minimum value MIN of fetching data; Data centralization maximum MAX, each data of data centralization

then

Step 6: display result

Data related and as a result display module at first extract the IP address set IPS among said order and the Control Network adfluxion CCS; Then each the IP address table among the IP address set IPS is shown a point, limit of drafting between the source IP address SIP that each network flow record FR is corresponding in said order and Control Network adfluxion CCS, the purpose IP address D IP; Constituted the detected P2P Botnet of defender structure by point that obtains and limit.

2. Basing on network fluid cluster according to claim 1 detects the method for P2P Botnet structure, it is characterized in that: in step 1, when PTL is same as Transmission Control Protocol, utilize ITL-IHL-THL to come computing application layer message length AML.

3. Basing on network fluid cluster according to claim 1 detects the method for P2P Botnet structure, it is characterized in that: in step 1, when PTL is same as udp protocol, utilize ITL-IHL-THL+8 to come computing application layer message length AML.

4. Basing on network fluid cluster according to claim 1 detects the method for P2P Botnet structure; It is characterized in that: in step 4, network flow record FR comprise source IP address SIP, purpose IP address D IP, source port number SPT, destination slogan DPT, protocol type PTL, time started ST, concluding time ET, datagram quantity PN, bytes in BN, network flow duration DRT, on average each datagram bytes in BPP, average each second datagram quantity PPS, average each second bytes in BPS; With said network flow record FR according to the tuple form in the mathematics be expressed as FR=(SIP, DIP, SPT, DPT, PTL, ST, ET, PN, BN, DRT, BPP, PPS, BPS); Said network flow record FR is stored among the network flow record sheet FRT.

5. Basing on network fluid cluster according to claim 1 detects the method for P2P Botnet structure; It is characterized in that: in step 4, said network flow characteristic FFT comprise network flow duration DRT, on average each datagram bytes in BPP, average each second datagram quantity PPS, average each second bytes in BPS; With said network flow characteristic FFT according to the tuple form in the mathematics be expressed as FFT=(DRT, BPP, PPS, BPS).

6. according to the method for claim 1 or 5 described Basing on network fluid clusters detection P2P Botnet structures, it is characterized in that: in step 4, said network flow characteristic FFT calculates according to network flow feature calculation strategy FFTCP; Said network flow feature calculation strategy FFTCP utilizes ST-ET to calculate said network flow duration DRT; Utilize BN/PN to calculate said average each datagram bytes in BPP; Utilize PN/DRT to calculate said average each second of datagram quantity PPS, utilize BN/DRT to calculate said average each second of bytes in BPS.