CN115514537A

CN115514537A - Method and system for judging suspicious traffic in encrypted traffic

Info

Publication number: CN115514537A
Application number: CN202211070466.XA
Authority: CN
Inventors: 卢国鸣
Original assignee: Shanghai Xingrong Information Technology Co ltd
Current assignee: Shanghai Xingrong Information Technology Co ltd
Priority date: 2022-09-02
Filing date: 2022-09-02
Publication date: 2022-12-23
Anticipated expiration: 2042-09-02
Also published as: CN115514537B

Abstract

The embodiments of this specification provide a method and system for judging suspicious traffic in encrypted traffic. The method includes: collecting encrypted traffic to be tested, and extracting encrypted traffic characteristics of the encrypted traffic to be tested; wherein, the encrypted traffic characteristics include the first traffic characteristic, the second A traffic feature includes access feature information, protocol feature information, and transfer feature information; based on the encrypted traffic feature of the encrypted traffic to be tested, the traffic type of the encrypted traffic to be tested is determined. The traffic type includes normal traffic and suspicious traffic, and the suspicious traffic is used for testing Subsequent decryption analysis of encrypted traffic.

Description

Method and system for judging suspicious traffic in encrypted traffic

技术领域technical field

本说明书涉及网络安全领域，特别涉及一种加密流量中可疑流量的判断方法和系统。This specification relates to the field of network security, in particular to a method and system for judging suspicious traffic in encrypted traffic.

背景技术Background technique

随着互联网的发展，人们的隐私意识也正在提高，因此，人们对于流量加密的需求不断增长。然而，加密流量在保护了隐私的同时，也为恶意流量的隐藏提供了便利。加密恶意流量中隐藏了许多已知或未知的威胁。需要提供一种能够提升网络安全防护能力的加密流量中可疑流量的判断方法。With the development of the Internet, people's awareness of privacy is also improving, so people's demand for traffic encryption continues to grow. However, while encrypted traffic protects privacy, it also facilitates the hiding of malicious traffic. Encrypted malicious traffic hides many known and unknown threats. It is necessary to provide a method for judging suspicious traffic in encrypted traffic that can improve network security protection capabilities.

发明内容Contents of the invention

本说明书一个或多个实施例提供一种加密流量中可疑流量的判断方法。所述方法包括：采集待测加密流量，提取所述待测加密流量的加密流量特征；其中，所述加密流量特征包括第一流量特征，所述第一流量特征包括访问特征信息、协议特征信息以及传递特征信息；基于所述待测加密流量的所述加密流量特征，确定所述待测加密流量的流量类型，所述流量类型包括正常流量和所述可疑流量，所述可疑流量用于所述待测加密流量的后续解密分析。One or more embodiments of this specification provide a method for judging suspicious traffic in encrypted traffic. The method includes: collecting encrypted traffic to be tested, and extracting encrypted traffic features of the encrypted traffic to be tested; wherein, the encrypted traffic features include first traffic features, and the first traffic features include access feature information and protocol feature information and transfer characteristic information; based on the encrypted traffic characteristics of the encrypted traffic to be tested, determine the traffic type of the encrypted traffic to be tested, the traffic type includes normal traffic and the suspicious traffic, and the suspicious traffic is used for the Describe the subsequent decryption analysis of the encrypted traffic to be tested.

本说明书一个或多个实施例提供一种加密流量中可疑流量的判断系统，所述系统包括：流量采集模块，用于采集待测加密流量，提取所述待测加密流量的加密流量特征；其中，所述加密流量特征包括第一流量特征，所述第一流量特征包括访问特征信息、协议特征信息以及传递特征信息；类型确定模块，用于基于所述待测加密流量的所述加密流量特征，确定所述待测加密流量的流量类型，所述流量类型包括正常流量和所述可疑流量，所述可疑流量用于所述待测加密流量的后续解密分析。One or more embodiments of this specification provide a judging system for suspicious traffic in encrypted traffic, the system includes: a traffic collection module, configured to collect encrypted traffic to be tested, and extract encrypted traffic characteristics of the encrypted traffic to be tested; wherein , the encrypted traffic feature includes a first traffic feature, and the first traffic feature includes access feature information, protocol feature information, and delivery feature information; a type determination module, configured to be based on the encrypted traffic feature of the encrypted traffic to be tested , determining a traffic type of the encrypted traffic to be tested, where the traffic type includes normal traffic and the suspicious traffic, and the suspicious traffic is used for subsequent decryption and analysis of the encrypted traffic to be tested.

本说明书一个或多个实施例提供一种加密流量中可疑流量的判断装置，包括处理器，所述处理器用于执行所述计算机指令中的至少部分指令以实现加密流量中可疑流量的判断方法。One or more embodiments of this specification provide an apparatus for judging suspicious traffic in encrypted traffic, including a processor configured to execute at least some of the computer instructions to implement a method for judging suspicious traffic in encrypted traffic.

本说明书一个或多个实施例提供一种计算机可读存储介质，所述存储介质存储计算机指令，当所述计算机指令被处理器执行时实现加密流量中可疑流量的判断方法。One or more embodiments of this specification provide a computer-readable storage medium, the storage medium stores computer instructions, and when the computer instructions are executed by a processor, a method for judging suspicious traffic in encrypted traffic is implemented.

附图说明Description of drawings

本说明书将以示例性实施例的方式进一步说明，这些示例性实施例将通过附图进行详细描述。这些实施例并非限制性的，在这些实施例中，相同的编号表示相同的结构，其中：This specification will be further illustrated by way of exemplary embodiments, which will be described in detail with the accompanying drawings. These examples are non-limiting, and in these examples, the same number indicates the same structure, wherein:

图1是根据本说明书一些实施例所示的加密流量中可疑流量的判断系统的应用场景示意图；FIG. 1 is a schematic diagram of an application scenario of a judging system for suspicious traffic in encrypted traffic according to some embodiments of this specification;

图2是根据本说明书一些实施例所示的加密流量中可疑流量的判断系统的示例性模块图；Fig. 2 is an exemplary block diagram of a system for judging suspicious traffic in encrypted traffic according to some embodiments of the present specification;

图3是根据本说明书一些实施例所示的加密流量中可疑流量的判断方法的示例性流程图；Fig. 3 is an exemplary flow chart of a method for judging suspicious traffic in encrypted traffic according to some embodiments of this specification;

图4是根据本说明书一些实施例所示的通过关系图谱获取第二流量特征的示例性示意图；Fig. 4 is an exemplary schematic diagram of acquiring second traffic characteristics through a relationship graph according to some embodiments of the present specification;

图5是根据本说明书一些实施例所示的可疑流量识别模型的示意图。Fig. 5 is a schematic diagram of a suspicious traffic identification model according to some embodiments of the present specification.

具体实施方式detailed description

为了更清楚地说明本说明书实施例的技术方案，下面将对实施例描述中所需要使用的附图作简单的介绍。显而易见地，下面描述中的附图仅仅是本说明书的一些示例或实施例，对于本领域的普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图将本说明书应用于其它类似情景。除非从语言环境中显而易见或另做说明，图中相同标号代表相同结构或操作。In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the following briefly introduces the drawings that need to be used in the description of the embodiments. Apparently, the accompanying drawings in the following description are only some examples or embodiments of this specification, and those skilled in the art can also apply this specification to other similar scenarios. Unless otherwise apparent from context or otherwise indicated, like reference numerals in the figures represent like structures or operations.

应当理解，本文使用的“系统”、“装置”、“单元”和/或“模块”是用于区分不同级别的不同组件、元件、部件、部分或装配的一种方法。然而，如果其他词语可实现相同的目的，则可通过其他表达来替换所述词语。It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, parts or assemblies of different levels. However, the words may be replaced by other expressions if other words can achieve the same purpose.

如本说明书和权利要求书中所示，除非上下文明确提示例外情形，“一”、“一个”、“一种”和/或“该”等词并非特指单数，也可包括复数。一般说来，术语“包括”与“包含”仅提示包括已明确标识的步骤和元素，而这些步骤和元素不构成一个排它性的罗列，方法或者设备也可能包含其它的步骤或元素。As indicated in the specification and claims, the terms "a", "an", "an" and/or "the" are not specific to the singular and may include the plural unless the context clearly indicates an exception. Generally speaking, the terms "comprising" and "comprising" only suggest the inclusion of clearly identified steps and elements, and these steps and elements do not constitute an exclusive list, and the method or device may also contain other steps or elements.

本说明书中使用了流程图用来说明根据本说明书的实施例的系统所执行的操作。应当理解的是，前面或后面操作不一定按照顺序来精确地执行。相反，可以按照倒序或同时处理各个步骤。同时，也可以将其他操作添加到这些过程中，或从这些过程移除某一步或数步操作。The flowchart is used in this specification to illustrate the operations performed by the system according to the embodiment of this specification. It should be understood that the preceding or following operations are not necessarily performed in the exact order. Instead, various steps may be processed in reverse order or simultaneously. At the same time, other operations can be added to these procedures, or a certain step or steps can be removed from these procedures.

图1是根据本说明书一些实施例所示的加密流量中可疑流量的判断系统的应用场景示意图。Fig. 1 is a schematic diagram of an application scenario of a system for judging suspicious traffic in encrypted traffic according to some embodiments of the present specification.

DPI(Deep Packet Inspection，深度报文检测)是指设备通过对网络的关键点处的流量和报文内容进行检测分析，可以根据事先定义的策略对检测流量进行过滤控制，能完成所在链路的业务精细化识别、业务流量流向分析、业务流量占比统计、业务占比整形、以及应用层拒绝服务攻击、对病毒、木马进行过滤和滥用P2P的控制等功能。例如，解密DPI模块可以在确定待测加密流量的流量类型为可疑流量时，对待测加密流量进行后续解密分析。DPI (Deep Packet Inspection) means that the device detects and analyzes the traffic and packet content at the key points of the network, and can filter and control the detected traffic according to the pre-defined policy, and can complete the inspection of the link where it is located. Fine-grained business identification, business traffic flow analysis, business traffic proportion statistics, business proportion shaping, application layer denial of service attacks, virus and Trojan horse filtering, and P2P abuse control. For example, the decryption DPI module may perform subsequent decryption analysis on the encrypted traffic to be tested when determining that the traffic type of the encrypted traffic to be tested is suspicious traffic.

如图1所示，应用场景100可以包括网络110、路由器120、处理器130、加密流量140以及流量判断结果150。路由器120可以获取来自网络110的待测加密流量140，处理器130可以拷贝路由器120中的加密流量140，以采集加密流量140，并生成流量判断结果150。As shown in FIG. 1 , an application scenario 100 may include a network 110 , a router 120 , a processor 130 , encrypted traffic 140 and a traffic judgment result 150 . The router 120 can acquire the encrypted traffic 140 to be tested from the network 110 , and the processor 130 can copy the encrypted traffic 140 in the router 120 to collect the encrypted traffic 140 and generate a traffic judgment result 150 .

网络110可以包括提供能够促进带宽应用场景100的信息和/或数据交换的任何合适的网络。应用场景100的路由器120可以通过与网络110交换信息和/或数据。例如，网络110可以将用户产生的流量信息发送到路由器120。在一些实施例中，网络110可以是有线网络或无线网络中的任意一种或多种。在一些实施例中，网络110可以包括一个或以上网络接入点。例如，网络110可以包括有线或无线网络接入点。在一些实施例中，网络可以是点对点的、共享的、中心式的等各种拓扑结构或者多种拓扑结构的组合。Network 110 may include any suitable network that provides information and/or data exchange capable of facilitating bandwidth application scenario 100 . The router 120 in the application scenario 100 may exchange information and/or data with the network 110 . For example, network 110 may send user-generated traffic information to router 120 . In some embodiments, the network 110 may be any one or more of a wired network or a wireless network. In some embodiments, network 110 may include one or more network access points. For example, network 110 may include wired or wireless network access points. In some embodiments, the network may be in various topologies such as point-to-point, shared, and central, or a combination of multiple topologies.

路由器120可以是读取数据包中的地址后将数据包进行存储、分组和转发处理的网络设备。在一些实施例中，路由器120可以用于连接两个或多个网络110。在一些实施例中，路由器120接收网络110的加密流量140，再将储存在路由器120中的加密流量140转发至处理器130。路由器120可以是本地的，也可以是远程的。The router 120 may be a network device for storing, grouping and forwarding the data packets after reading the addresses in the data packets. In some embodiments, a router 120 may be used to connect two or more networks 110 . In some embodiments, the router 120 receives the encrypted traffic 140 of the network 110 , and forwards the encrypted traffic 140 stored in the router 120 to the processor 130 . Router 120 may be local or remote.

处理器130可以包括加密流量140中可疑流量的判断方法的执行设备，可以处理从路由器120中获得的数据和/或信息，并根据相关数据执行本说明书提供的加密流量中可疑流量的判断方法，生成流量判断结果150。例如，处理器130可以根据路由器120接收到的加密流量信息确定流量特征，再基于流量特征判断加密流量140是否为可疑流量，进而生成流量判断结果150。在一些实施例中，处理器130可以是单个服务器或服务器组。在一些实施例中，处理器130可以集成于可疑流量判断系统(如集成设置在路由器120内)。处理器130可以是本地的，也可以是远程的。处理器130可以在云平台上实现。The processor 130 may include an execution device for judging the suspicious traffic in the encrypted traffic 140, may process the data and/or information obtained from the router 120, and execute the judging method for the suspicious traffic in the encrypted traffic provided in this specification according to the relevant data, A flow judgment result 150 is generated. For example, the processor 130 may determine traffic characteristics according to the encrypted traffic information received by the router 120 , and then judge whether the encrypted traffic 140 is suspicious traffic based on the traffic characteristics, and then generate the traffic judging result 150 . In some embodiments, processor 130 may be a single server or a group of servers. In some embodiments, the processor 130 may be integrated into the suspicious traffic judging system (such as being integrated in the router 120). Processor 130 may be local or remote. The processor 130 may be implemented on a cloud platform.

流量可以是用户在上网的过程中产生的流量。在一些实施例中，流量可以是加密流量或非加密流量。将流量进行加密是为了应对各种窃听和中间人攻击，使得网页基本不会被篡改，保证用户上网的安全。然而，加密流量140中仍然会隐藏一些恶意流量。在一些实施例中，恶意流量的判断由处理器130执行。例如，含有恶意流量的加密流量140被网络110传输至路由器后，由处理器130判断为可疑流量。The traffic may be traffic generated by the user while surfing the Internet. In some embodiments, traffic may be encrypted traffic or non-encrypted traffic. The purpose of encrypting the traffic is to deal with various eavesdropping and man-in-the-middle attacks, so that the webpage will basically not be tampered with, ensuring the security of users surfing the Internet. However, some malicious traffic will still be hidden in the encrypted traffic 140. In some embodiments, the determination of malicious traffic is performed by the processor 130 . For example, after the encrypted traffic 140 containing malicious traffic is transmitted to the router by the network 110 , it is determined by the processor 130 as suspicious traffic.

流量判断结果150可以包括加密流量140为可疑流量或加密流量140为正常流量。在一些实施例中，流量判断结果150由处理器130执行。The traffic judging result 150 may include that the encrypted traffic 140 is suspicious traffic or the encrypted traffic 140 is normal traffic. In some embodiments, the flow determination result 150 is executed by the processor 130 .

应当注意，应用场景100仅仅是为了说明的目的而提供的，并不意图限制本申请的范围。对于本领域的普通技术人员来说，可以根据本说明书的描述，做出多种修改或变化。例如，应用场景100还可以包括信息源。然而，这些变化和修改不会背离本申请的范围。It should be noted that the application scenario 100 is provided for illustrative purposes only, and is not intended to limit the scope of the present application. For those skilled in the art, various modifications or changes can be made according to the description in this specification. For example, the application scenario 100 may also include information sources. However, these changes and modifications do not depart from the scope of the present application.

图2是根据本说明书一些实施例所示的加密流量中可疑流量的判断系统的模块图。Fig. 2 is a block diagram of a system for judging suspicious traffic in encrypted traffic according to some embodiments of the present specification.

如图2所示，在一些实施例中，可疑流量判断系统200可以包括流量特征获取模块210、流量类型确定模块220。As shown in FIG. 2 , in some embodiments, the suspicious traffic judgment system 200 may include a traffic characteristic acquisition module 210 and a traffic type determination module 220 .

流量特征获取模块210可以用于采集待测加密流量，提取待测加密流量的加密流量特征。在一些实施例中，加密流量特征可以包括第一流量特征，第一流量特征可以包括访问特征信息、协议特征信息以及传递特征信息。有关流量特征获取的具体内容可以参见步骤310及其相关描述。The traffic feature acquiring module 210 may be used to collect the encrypted traffic to be tested, and extract the encrypted traffic feature of the encrypted traffic to be tested. In some embodiments, the encrypted traffic feature may include a first traffic feature, and the first traffic feature may include access feature information, protocol feature information, and delivery feature information. For specific content about acquiring traffic characteristics, refer to step 310 and related descriptions.

流量类型确定模块220可以用于基于待测加密流量的加密流量特征，确定待测加密流量的流量类型。在一些实施例中，流量类型可以包括正常流量和可疑流量。可疑流量用于待测加密流量的后续解密分析。有关流量类型确定的具体内容可以参见步骤320及其相关描述。The traffic type determining module 220 may be configured to determine the traffic type of the encrypted traffic to be tested based on the encrypted traffic characteristics of the encrypted traffic to be tested. In some embodiments, traffic types may include normal traffic and suspicious traffic. Suspicious traffic is used for subsequent decryption and analysis of the encrypted traffic to be tested. For details about determining the traffic type, refer to step 320 and its related descriptions.

在一些实施例中，流量类型确定模块220还可以进一步用于：基于可疑流量识别模型对待测加密流量的加密流量特征进行处理，确定待测加密流量的所述流量类型，可疑流量识别模型为机器学习模型。有关可疑流量识别模型的具体内容可以参见图5及其相关描述。In some embodiments, the traffic type determination module 220 can be further used to: process the encrypted traffic characteristics of the encrypted traffic to be tested based on the suspicious traffic identification model, and determine the traffic type of the encrypted traffic to be tested. The suspicious traffic identification model is a machine learning model. For the specific content of the suspicious traffic identification model, please refer to Figure 5 and its related descriptions.

在一些实施例中，可疑流量判断系统200还可以包括解密DPI模块230，用于响应于待测加密流量的流量类型为可疑流量，对待测加密流量进行后续解密分析。In some embodiments, the suspicious traffic judging system 200 may further include a decryption DPI module 230, configured to perform subsequent decryption analysis on the encrypted traffic to be tested in response to the traffic type of the encrypted traffic to be tested as suspicious traffic.

需要注意的是，以上对于可疑流量判断系统200及其模块的描述，仅为描述方便，并不能把本说明书限制在所举实施例范围之内。可以理解，对于本领域的技术人员来说，在了解该系统的原理后，可能在不背离这一原理的情况下，对各个模块进行任意组合，或者构成子系统与其他模块连接。在一些实施例中，图2中披露的流量特征获取模块210与流量类型确定模块220可以是一个系统中的不同模块，也可以是一个模块实现上述的两个或两个以上模块的功能。例如，各个模块可以共用一个存储模块，各个模块也可以分别具有各自的存储模块。诸如此类的变形，均在本说明书的保护范围之内。It should be noted that the above description of the suspicious traffic judging system 200 and its modules is only for convenience of description, and does not limit this description to the scope of the illustrated embodiments. It can be understood that for those skilled in the art, after understanding the principle of the system, it is possible to combine various modules arbitrarily, or form a subsystem to connect with other modules without departing from this principle. In some embodiments, the traffic feature acquisition module 210 and the traffic type determination module 220 disclosed in FIG. 2 may be different modules in one system, or one module may realize the functions of the above two or more modules. For example, each module may share one storage module, or each module may have its own storage module. Such deformations are within the protection scope of this specification.

图3是根据本说明书一些实施例所示的加密流量中可疑流量的判断方法的示例性流程图。在一些实施例中，流程300可以由处理器130执行。如图3所示，流程300包括下述步骤。Fig. 3 is an exemplary flowchart of a method for judging suspicious traffic in encrypted traffic according to some embodiments of the present specification. In some embodiments, the process 300 may be executed by the processor 130 . As shown in FIG. 3 , the process 300 includes the following steps.

步骤310，采集待测加密流量，提取待测加密流量的加密流量特征。Step 310, collect the encrypted traffic to be tested, and extract the encrypted traffic characteristics of the encrypted traffic to be tested.

加密流量是指网络流量中已经被加密了的流量。加密流量可以由用户加密，也可以是由服务商加密，以保护隐私。例如，对于需要基于互联网通信在线处理业务的用户而言，可以在移动应用、云应用和Web应用依赖加密机制，通过数据加密过程，使用密钥和证书等来确保安全性并建立信任。Encrypted traffic refers to traffic that has been encrypted in network traffic. Encrypted traffic can be encrypted by the user or by the service provider to protect privacy. For example, for users who need to process business online based on Internet communication, they can rely on encryption mechanisms in mobile applications, cloud applications, and web applications, and use keys and certificates to ensure security and establish trust through the data encryption process.

其中，数据加密的基本过程包括对原来为明文的文件或数据(流量)按算法进行处理，使其成为不可读的一段代码，通常称为“密文”。通过该数据加密途径，实现保护数据不被非法窃取、阅读。在一些实施例中，加密流量可以包括加密正常流量和加密可疑流量。其中，加密可疑流量往往伪装或隐藏恶意流量特征。例如，加密可疑流量往往伪装或隐藏有攻击行为的特洛伊木马、感染式病毒、蠕虫病毒、恶意下载器等，对服务器发起攻击，导致服务器出现崩溃等问题。Among them, the basic process of data encryption includes processing the original plaintext files or data (traffic) according to the algorithm to make it an unreadable piece of code, usually called "ciphertext". Through this data encryption method, the protection of data from being illegally stolen and read is realized. In some embodiments, encrypting traffic may include encrypting normal traffic and encrypting suspicious traffic. Among them, encrypted suspicious traffic often disguises or hides malicious traffic characteristics. For example, encrypted suspicious traffic often disguises or hides attacking Trojan horses, infectious viruses, worms, malicious downloaders, etc., and attacks the server, causing server crashes and other problems.

加密流量特征是与待测加密流量相关的流量特征。流量特征可以包括五元组信息、加密协议信息、数据包平均大小，数据包平均发送间隔等统计信息。五元组信息包括源IP地址、源端口、目的IP地址、目的端口和传输层协议。加密协议信息指在服务器端与客户端在认证过程中建立的安全通信的和协议有关的消息。认证过程包括：客户端发送消息到服务器；服务器发出自认证的消息响应客户端；客户端和服务器完成密钥交换，结束认证过程。在一些实施例中，加密协议信息可以包括TLS/SSL协议版本、扩展字段等。数据包平均大小指若干个数据包中数据的平均长度，用字节表示，例如，十个IP包的平均长度为1000字节。数据包平均发送间隔是指在数据传输过程中，当前数据帧发送与下一数据帧发送之间的平均时间间隔，例如平均每隔2秒发送一次数据帧。The encrypted traffic feature is a traffic feature related to the encrypted traffic to be tested. Traffic characteristics can include statistical information such as quintuple information, encryption protocol information, average data packet size, and average data packet sending interval. The five-tuple information includes source IP address, source port, destination IP address, destination port and transport layer protocol. Encrypted protocol information refers to messages related to the protocol for the secure communication established between the server and the client during the authentication process. The authentication process includes: the client sends a message to the server; the server sends a self-authenticated message to respond to the client; the client and the server complete the key exchange and end the authentication process. In some embodiments, the encryption protocol information may include TLS/SSL protocol version, extension fields, and the like. The average data packet size refers to the average length of data in several data packets, expressed in bytes, for example, the average length of ten IP packets is 1000 bytes. The average data packet sending interval refers to the average time interval between sending a current data frame and sending a next data frame during data transmission, for example, sending a data frame every 2 seconds on average.

在一些实施例中，加密流量特征可以包括第一流量特征，第一流量特征为与待测加密流量包含的内容相关的特征。在一些实施例中，第一流量特征可以包括访问特征信息、协议特征信息以及传递特征信息。In some embodiments, the encrypted traffic feature may include a first traffic feature, and the first traffic feature is a feature related to content included in the encrypted traffic to be tested. In some embodiments, the first traffic feature may include access feature information, protocol feature information, and delivery feature information.

访问特征信息指和访问有关的特征信息。访问可以是指访问者主动针对某一个特定目的，利用网络平台进行主动检索的过程，在访问过程中会产生流量。例如，加密流量可以是访问者点击书签中收藏的网站URL产生的流量或者访问者在浏览器地址栏中直接输入网址产生的流量。访问特征信息可以用于区分不同的会话，例如不同用户之间的通信。在一些实施例中，访问特征信息可以包括源IP地址、源端口、目的IP地址、目的端口等信息。Access characteristic information refers to characteristic information related to access. Access can refer to the process in which visitors actively search for a specific purpose and use the network platform to generate traffic during the access process. For example, the encrypted traffic may be the traffic generated by the visitor clicking on the URL of the website saved in the bookmark or the traffic generated by the visitor directly entering the URL in the address bar of the browser. Access characteristic information can be used to distinguish different sessions, such as communications between different users. In some embodiments, the access feature information may include source IP address, source port, destination IP address, destination port and other information.

可以基于访问特征信息判断加密流量是否为可疑流量。例如，一天内某个浏览器对应的IP地址的访问量比以往高出500％，这个时候需要查看是否为可疑流量导致访问数据上升的。Whether the encrypted traffic is suspicious traffic can be judged based on the access feature information. For example, the visit volume of the IP address corresponding to a certain browser in one day is 500% higher than before. At this time, it is necessary to check whether the increase in visit data is caused by suspicious traffic.

协议特征信息指和协议有关的特征信息。协议特征信息可以用于区分网络流量进行传输的方式，例如，是否为加密传输等。在一些实施例中，协议特征信息可以包括传输协议信息、加密协议信息等。The protocol characteristic information refers to characteristic information related to the protocol. The protocol characteristic information can be used to distinguish the transmission mode of the network traffic, for example, whether the transmission is encrypted or not. In some embodiments, the protocol feature information may include transmission protocol information, encryption protocol information, and the like.

可以基于协议特征信息判断加密流量为可疑流量的概率。例如，统计历史恶意流量往往会选择隐藏的加密传输协议，通过识别加密流量所采用的加密协议，如SecureSocket Layer(SSL)等，可以确定该加密协议更容易被恶意流量攻击。The probability that encrypted traffic is suspicious traffic can be judged based on protocol feature information. For example, statistics on historical malicious traffic often choose hidden encrypted transmission protocols. By identifying the encryption protocol used by encrypted traffic, such as SecureSocket Layer (SSL), it can be determined that the encryption protocol is more likely to be attacked by malicious traffic.

传递特征信息指和信息传递有关的特征信息。在一些实施例中，传递特征信息可以包括数据包平均大小、数据包平均发送间隔等。The transfer characteristic information refers to characteristic information related to information transfer. In some embodiments, the delivery feature information may include an average data packet size, an average data packet sending interval, and the like.

可以基于传递特征信息判断加密流量是否为可疑流量。例如，某网络平台的正常访问时间为8:00-18:00、数据包平均大小为512字节、数据包平均发送间隔为20ms，但是某一天内在02:00-04:00出现大量密集访问、数据包平均大小为1500字节、数据包平均发送间隔异常降低为6ms，可以将这部分异常流量确定为可疑流量。Whether the encrypted traffic is suspicious traffic can be judged based on the transfer feature information. For example, the normal access time of a network platform is 8:00-18:00, the average data packet size is 512 bytes, and the average data packet sending interval is 20ms, but a large number of intensive accesses occur during 02:00-04:00 in a certain day , The average size of data packets is 1500 bytes, and the average sending interval of data packets is abnormally reduced to 6ms. This part of abnormal traffic can be determined as suspicious traffic.

在一些实施例中，第一流量特征还可以包括字节分布概率向量。In some embodiments, the first traffic characteristic may further include a byte distribution probability vector.

在计算机安全领域，数据以数据包的形式从发送方传输到接收方，数据包包含一个报头，由发送方发送的数据称为有效负载，即接收者用IP数据包总长度减去IP报头长度，就可以确定数据包有效负载大小。报头被附加到有效负载上进行传输，然后在成功到达目的地时将其丢弃。恶意流量传播病毒的主要来源是有效负载。有效负载包括数据破坏、带有侮辱性文本的消息或发送给大量人员的批量电子邮件。字节分布指的是数据包有效负载中每个字节值的计数。例如，一个数据包的字节分布可以是：该数据包的有效负载中，第一字节“00000001”出现了10次，第二字节“00000011”出现了15次、…、第N字节“11111111”出现了5次。字节分布概率指的是数据包的有效负载中每个字节值的出现概率。在一些实施例中，可以用每个字节值的出现频率来近似表示该字节值的出现概率。字节分布概率向量是指一个字节可能取到的256个数值分别在数据流中出现的概率构成的向量。字节分布概率向量能够提供大量的数据编码和数据填充的信息，很多恶意流量的非法行为往往就隐藏在这些信息中。在一些实施例中，可以将每个字节值的字节分布计数除以有效负载中的总字节数，得到字节分布频率，并用其表示字节分布概率，最终将这一特征表示为一个1*256维的字节分布概率向量。例如，恶意流量可能利用HTTP头部的某些字段(例如，content-type，server等)来发起一些恶意活动，这说明HTTP字段能够很好地指示一些恶意活动。HTTP上下文流是指在安全传输协议TLS(Transport Layer Security)5min窗口内由相同源IP地址发出的所有HTTP流。用一个二进制变量的特征向量表示所有观察到的HTTP头部信息，如果任一HTTP流具有特定的标头值(即，含有恶意流量的标头)，则无论其他HTTP流如何，该特征都将为1。对于字节分布概率向量P1，处理器130可以统计预设时间段内网络流量中P1的100条流量，其中，60条HTTP头部特征为1，即表示这60条是恶意流量，待测加密流量的字节分布概率向量为P1的流量为恶意流量的频率为60％。In the field of computer security, data is transmitted from the sender to the receiver in the form of data packets. The data packet contains a header. The data sent by the sender is called the payload, that is, the receiver subtracts the length of the IP header from the total length of the IP data packet , the packet payload size can be determined. Headers are appended to the payload for transmission and then discarded when it successfully reaches its destination. The main source of malicious traffic spreading viruses is the payload. Payloads include data destruction, messages with abusive text, or bulk emails sent to large numbers of people. Byte distribution refers to the count of each byte value in the packet payload. For example, the byte distribution of a data packet may be: in the payload of the data packet, the first byte "00000001" appears 10 times, the second byte "00000011" appears 15 times, ..., the Nth byte "11111111" occurs 5 times. The byte distribution probability refers to the occurrence probability of each byte value in the payload of the packet. In some embodiments, the occurrence frequency of each byte value can be used to approximate the occurrence probability of the byte value. The byte distribution probability vector refers to a vector composed of the probabilities of the 256 values that a byte may take in appearing in the data stream. The byte distribution probability vector can provide a large amount of data encoding and data filling information, and many illegal behaviors of malicious traffic are often hidden in this information. In some embodiments, the byte distribution count of each byte value can be divided by the total number of bytes in the payload to obtain the byte distribution frequency, and use it to represent the byte distribution probability, and finally express this feature as A 1*256-dimensional byte distribution probability vector. For example, malicious traffic may use certain fields of the HTTP header (for example, content-type, server, etc.) to initiate some malicious activities, which shows that HTTP fields can well indicate some malicious activities. HTTP context flow refers to all HTTP flows sent by the same source IP address within a 5-minute window of the secure transmission protocol TLS (Transport Layer Security). A feature vector of binary variables represents all observed HTTP header information. If any HTTP flow has a specific header value (i.e., a header containing malicious traffic), then the feature will be is 1. For the byte distribution probability vector P1, the processor 130 can count the 100 flows of P1 in the network flow within the preset time period, among which, 60 of the HTTP header features are 1, which means that these 60 are malicious flows, and the encryption to be tested The byte distribution probability vector of traffic is P1, and the frequency of malicious traffic is 60%.

可以通过不同的流量采集方法实现对待测加密流量的采集。待测加密流量采集方法包括但不限于Sniffer(嗅探法)、SNMP(Simple Network Management Protocol，简单网络管理协议)、Netflow、sFlow等。The encrypted traffic to be tested can be collected through different traffic collection methods. The encrypted flow collection method to be tested includes but is not limited to Sniffer (sniffing method), SNMP (Simple Network Management Protocol, Simple Network Management Protocol), Netflow, sFlow, and the like.

在一些实施例中，可以采用Sniffer采集加密流量。仅作为示例，可以在交换机的镜像端口设置数据采集点，通过镜像端口完全复制网络中的数据信息来采集待测加密流量信息。In some embodiments, Sniffer may be used to collect encrypted traffic. As an example only, a data collection point may be set on the mirror port of the switch, and the data information in the network may be completely copied through the mirror port to collect the encrypted traffic information to be tested.

采集完待测加密流量后，对待测加密流量的加密流量特征进行提取。可以通过多种方式提取加密流量特征，例如加密流量基本信息提取库(例如，Flowcontainer)、加密流量特征提取工具(例如，WireShark、QPA、Tstat等)或其他加密流量提取算法、机器学习模型等。After the encrypted traffic to be tested is collected, the encrypted traffic characteristics of the encrypted traffic to be tested are extracted. Encrypted traffic features can be extracted in a variety of ways, such as encrypted traffic basic information extraction libraries (for example, Flowcontainer), encrypted traffic feature extraction tools (for example, WireShark, QPA, Tstat, etc.) or other encrypted traffic extraction algorithms, machine learning models, etc.

在一些实施例中，加密流量特征还可以包括第二流量特征。第二流量特征为由待测加密流量本身的内容衍生的特征。第二流量特征(域名热度)可以是从待测加密流量中提取目的域名后，通过该第二流量特征所涉及的其他外界内容(比如将该域名在互联网、知识图谱中查询)确定。在一些实施例中，第二流量特征可以包括域名热度。In some embodiments, the encrypted traffic characteristic may further include a second traffic characteristic. The second traffic feature is a feature derived from the content of the encrypted traffic to be tested. The second traffic feature (domain name popularity) can be determined by extracting the destination domain name from the encrypted traffic to be tested, and then determining it through other external content involved in the second traffic feature (such as querying the domain name on the Internet or knowledge graph). In some embodiments, the second traffic feature may include domain name popularity.

域名热度是指恶意流量倾向于访问的域名的程度。在一些实施例中，域名热度可以包括恶意流量访问该域名的次数(或频率、概率等)。域名被恶意流量访问的频率越高，该域名热度越高。在一些实施例中，网络流量中若包括高热度域名，则流量类型确定模块220可以判断该网络流量为可疑流量的概率更高。Domain popularity is the degree to which malicious traffic tends to visit a domain name. In some embodiments, the popularity of the domain name may include the number of times (or frequency, probability, etc.) that malicious traffic visits the domain name. The higher the frequency of a domain name being accessed by malicious traffic, the higher the popularity of the domain name. In some embodiments, if the network traffic includes a high-profile domain name, the traffic type determination module 220 may determine that the network traffic has a higher probability of being suspicious traffic.

在一些实施例中，流量特征获取模块210可以获取待测加密流量的第二流量特征。在一些实施例中，第二流量特征可以表现为一个分数值。例如，第二流量特征为域名热度时，域名热度分数值越高，则域名热度越高，反之则越低。该分数值可以根据域名的历史访问情况、用户对域名的举报情况等方式获得。在一些实施例中，第二流量特征可以用于判断该域名是否容易被恶意流量攻击，并进一步用于流量类型确定。In some embodiments, the traffic characteristic acquiring module 210 may acquire the second traffic characteristic of the encrypted traffic to be tested. In some embodiments, the second traffic characteristic may represent a fractional value. For example, when the second traffic feature is domain name popularity, the higher the domain name popularity score, the higher the domain name popularity, and vice versa. The score value can be obtained according to the historical access status of the domain name, the report status of the domain name by the user, and so on. In some embodiments, the second traffic feature may be used to determine whether the domain name is vulnerable to malicious traffic attacks, and further used to determine the traffic type.

在一些实施例中，第二流量特征可以通过关系图谱获取。In some embodiments, the second flow characteristic can be obtained through a relational graph.

图4是根据本说明书一些实施例所示的通过关系图谱获取第二流量特征的示例性示意图。Fig. 4 is an exemplary schematic diagram of obtaining a second traffic characteristic through a relationship graph according to some embodiments of the present specification.

关系图谱可以包括域名节点、实体节点、连接实体节点之间的边、以及连接实体节点与域名节点的边；边的边属性可以包括通信相关数据，以及流量类型。例如，域名节点可以是Jming.com，实体节点可以是与该域名对应的IP地址207.46.197.101。一个IP地址可以对应多个域名，但是一个域名只有一个IP地址。当用户键入某个域名的时候，该域名首先到达此域名解析服务器(Domain Name System,DNS)上，再将此域名解析为相应网站的IP地址，完成这一任务的过程就称为域名解析。通过域名节点和实体节点共同来完成客户端主机对服务器的访问。The relationship graph may include domain name nodes, entity nodes, edges connecting entity nodes, and edges connecting entity nodes and domain name nodes; edge attributes of edges may include communication-related data and traffic types. For example, the domain name node may be Jming.com, and the entity node may be the IP address 207.46.197.101 corresponding to the domain name. An IP address can correspond to multiple domain names, but a domain name has only one IP address. When a user enters a domain name, the domain name first reaches the domain name resolution server (Domain Name System, DNS), and then resolves the domain name to the IP address of the corresponding website. The process of completing this task is called domain name resolution. The client host's access to the server is completed through the domain name node and the entity node.

在一些实施例中，流量特征获取模块210可以基于待测加密流量信息，构建关系图谱，根据关系图谱确定的恶意邻度值，得到第二流量特征。恶意邻度值表示某个节点(例如，节点A)满足预设条件的边的数量。其中，预设条件可以包括：所述边的方向指向节点A，即以节点A为终点。其中，所述边的流量类型为恶意流量。In some embodiments, the traffic characteristic acquisition module 210 may construct a relational graph based on the encrypted traffic information to be tested, and obtain the second traffic characteristic according to the malicious neighbor degree value determined in the relational graph. The malicious adjacency value indicates the number of edges of a certain node (for example, node A) satisfying a preset condition. Wherein, the preset condition may include: the direction of the edge points to node A, that is, node A is an end point. Wherein, the traffic type of the edge is malicious traffic.

如图4所示，关系图谱410可以包括域名节点420(例如，节点A、节点B、节点C)，实体节点430(例如，节点1、节点2、节点3)，以及连接节点的边440。其中，边为有向边。As shown in FIG. 4 , the relationship graph 410 may include domain name nodes 420 (eg, node A, node B, node C), entity nodes 430 (eg, node 1 , node 2 , node 3 ), and edges 440 connecting the nodes. Among them, the edge is a directed edge.

在一些实施例中，流量特征获取模块210可以根据各个节点之间的通信构建关系图谱的边。边表示的通信为抽象通信。一次抽象通信可以包括短时间内的多次信息交互。例如，节点A和B之间由一条有向边连接，代表短时间内节点A和节点B之间进行了多次信息交互，这多次信息交互可以看成一次抽象通信。边的方向可以由第一次信息交互的发起者确定。示例性地，在上述多次信息交互中，第一次信息交互是由A向B发起的，那么多次信息交互的方向就可以相应确定为A指向B。在一些实施例中，响应于节点A和节点B之间存在多次通信(例如，时间跨度较长的不同时间发生的通信)，则节点A和节点B可以有多条有向边。在一些实施例中，处理器130可以基于边的属性统计各个节点之间流量类型为“恶意流量”的边数。恶意邻度值为0表示两个节点之间不存在流量类型为“恶意流量”的边，恶意邻度值为1表示两个节点之间的流量类型为恶意流量的边有一条，恶意邻度值为2表示两个节点之间流量类型为恶意流量的边为两条，以此类推。在一些实施例中，可以根据恶意邻度值，确定第二流量特征。关于基于恶意邻度值确定第二流量特征的更多内容可以参见图4及其相关描述。In some embodiments, the traffic characteristic acquisition module 210 may construct the edges of the relationship graph according to the communication between each node. The communication represented by an edge is an abstract communication. An abstract communication can include multiple information exchanges in a short period of time. For example, nodes A and B are connected by a directed edge, which means that there have been multiple information exchanges between nodes A and B in a short period of time. These multiple information exchanges can be regarded as an abstract communication. The direction of the edge can be determined by the initiator of the first information interaction. Exemplarily, in the above multiple information exchanges, the first information exchange is initiated by A to B, then the direction of multiple information exchanges can be determined as A pointing to B accordingly. In some embodiments, in response to multiple communications between node A and node B (eg, communications occurring at different times with a long time span), node A and node B may have multiple directed edges. In some embodiments, the processor 130 may count the number of edges whose traffic type is "malicious traffic" between nodes based on the attributes of the edges. A malicious adjacency value of 0 means that there is no edge with the traffic type of "malicious traffic" between two nodes. A malicious adjacency value of 1 means that there is an edge with a traffic type of malicious traffic between two nodes. Malicious adjacency A value of 2 indicates that there are two edges with the traffic type of malicious traffic between two nodes, and so on. In some embodiments, the second traffic characteristic may be determined according to the malicious neighbor value. For more details about determining the second traffic characteristic based on the malicious neighbor value, refer to FIG. 4 and related descriptions.

示意流程400为通过关系图谱确定第二流量特征的示例。示例性地，示意流程400中第二流量特征460为域名热度。具体地，流量特征获取模块210可以基于待测加密流量信息(例如，IP地址)可以找到与其对应的节点，处理器130可以基于关系图谱410获取所有连接到该节点的边440，并统计该节点在图谱中对应的边属性中的流量类型为恶意流量的边的数量；基于边的数量确定恶意邻度值450；基于恶意邻度值确定域名热度。如图4所示，节点1与节点B的恶意邻度值为1，节点3的恶意邻度值为2，节点2的恶意邻度值为3，则节点2的域名热度最高。The schematic flow 400 is an example of determining the second traffic characteristic through a relational graph. Exemplarily, the second traffic feature 460 in the schematic process 400 is the popularity of domain names. Specifically, the traffic feature acquisition module 210 can find the corresponding node based on the encrypted traffic information (for example, IP address) to be tested, and the processor 130 can obtain all edges 440 connected to the node based on the relationship graph 410, and count the nodes The number of edges whose traffic type in the corresponding edge attribute in the graph is malicious traffic; determine the malicious neighborhood value 450 based on the number of edges; determine the popularity of the domain name based on the malicious neighborhood value. As shown in Figure 4, the malicious adjacency value of node 1 and node B is 1, the malicious adjacency value of node 3 is 2, and the malicious adjacency value of node 2 is 3, so the domain name popularity of node 2 is the highest.

在一些实施例中，流量特征获取模块210可以根据恶意邻度值450和预设邻度规则确定域名热度。其中，预设邻度规则可以是将各个节点对应的恶意邻度值按照大小进行排序，排名越靠前，域名热度越高。预设邻度规则可以根据实际需求设定。例如，输出排名靠前的三个节点对应的域名最为高域名热度的域名，对于高域名热度的域名，该域名对应的流量可以不经过流量类型分类直接进入后续的解密DPI分析。In some embodiments, the traffic feature acquisition module 210 may determine the popularity of the domain name according to the malicious proximity value 450 and preset proximity rules. Wherein, the preset neighbor degree rule may be to sort the malicious neighbor degree values corresponding to each node according to the size, and the higher the ranking, the higher the popularity of the domain name. The preset proximity rules can be set according to actual needs. For example, output the domain name corresponding to the top three nodes with the highest domain name popularity. For the domain name with high domain name popularity, the traffic corresponding to the domain name can directly enter the subsequent decryption DPI analysis without classifying the traffic type.

在一些实施例中，关系图谱的边特征还可以包括两个节点之间的通讯数据被用户举报的次数。其中，用户端位于被恶意流量攻击的节点。当两个节点之间的通信数据被用户举报时，记录该次通信对应的流量类型为恶意流量。处理器130可以统计各个节点之间流量类型为“恶意流量”的边数。进一步地，基于边的数量，确定第二流量特征。仅作为示例，当第二流量特征为域名热度时，流量类型为“恶意流量”的边的数量越多，表明该节点所对应的域名的域名热度越高。In some embodiments, the edge feature of the relationship graph may also include the number of times the communication data between two nodes is reported by the user. Among them, the client is located at a node attacked by malicious traffic. When the communication data between two nodes is reported by the user, the traffic type corresponding to the communication is recorded as malicious traffic. The processor 130 may count the number of edges whose traffic type is "malicious traffic" between nodes. Further, based on the number of edges, the second flow characteristic is determined. As an example only, when the second traffic feature is domain name popularity, the more edges whose traffic type is "malicious traffic", the higher the domain name popularity of the domain name corresponding to this node is.

本说明书实施例中通过关系图谱确定第二流量特征，可以有效地对网络流量的流量类型整合，构建域名与域名，实体IP与实体IP，以及域名和实体IP之间的关联，即根据流量的流向来构建节点与节点之间的边，从而更高效地支撑第二流量特征的挖掘与提取；通过确定第二流量特征，能够提高判断加密流量特征为恶意流量的准确度。In the embodiment of this specification, the determination of the second traffic characteristics through the relationship graph can effectively integrate the traffic types of network traffic, and construct the association between domain names and domain names, entity IPs and entity IPs, and domain names and entity IPs, that is, according to traffic The flow direction is used to construct the edge between nodes, so as to support the mining and extraction of the second traffic feature more efficiently; by determining the second traffic feature, the accuracy of judging that the encrypted traffic feature is malicious traffic can be improved.

步骤320，基于待测加密流量的加密流量特征，确定待测加密流量的流量类型。Step 320: Determine the traffic type of the encrypted traffic to be tested based on the encrypted traffic characteristics of the encrypted traffic to be tested.

在一些实施例中，流量类型可以包括正常流量和可疑流量。可以通过多种方式确定待测加密流量的流量类型。在一些实施例中，可以基于历史数据、预设规则、或可疑流量识别模型等方式确定流量类型。在一些实施例中，基于历史数据确定流量类型包括：通过流量类型确定模块220获取历史可疑流量，并将历史可疑流量与待测加密流量的流量特征进行对比，当相似度大于一定阈值(例如，大于0.8)时，确定待测加密流量的流量类型为可疑流量。在一些实施例中，基于预设规则确定流量类型包括当待测加密流量的流量特征的可疑流量特征数量大于一定数值(例如，大于1)时，确定待测加密流量的流量类型为可疑流量。在一些实施例中，可疑流量识别模型可以为机器学习模型。有关可疑流量识别模型具体内容可以参见图5及其相关描述。In some embodiments, traffic types may include normal traffic and suspicious traffic. There are several ways to determine the traffic type of the encrypted traffic to be tested. In some embodiments, the traffic type may be determined based on historical data, preset rules, or a suspicious traffic identification model. In some embodiments, determining the traffic type based on historical data includes: obtaining historical suspicious traffic through the traffic type determination module 220, and comparing the historical suspicious traffic with the traffic characteristics of the encrypted traffic to be tested, when the similarity is greater than a certain threshold (for example, When greater than 0.8), it is determined that the traffic type of the encrypted traffic to be tested is suspicious traffic. In some embodiments, determining the traffic type based on preset rules includes determining that the traffic type of the encrypted traffic to be tested is suspicious traffic when the number of suspicious traffic characteristics of the traffic signature of the encrypted traffic to be tested is greater than a certain value (for example, greater than 1). In some embodiments, the suspicious traffic identification model may be a machine learning model. For the specific content of the suspicious traffic identification model, please refer to Figure 5 and its related descriptions.

步骤330，响应于待测加密流量的流量类型为可疑流量，对待测加密流量进行后续解密分析。Step 330, in response to the traffic type of the encrypted traffic to be tested being suspicious traffic, perform subsequent decryption analysis on the encrypted traffic to be tested.

在一些实施例中，若待测加密流量的流量类型为正常流量，则无需进行后续解密分析。In some embodiments, if the traffic type of the encrypted traffic to be tested is normal traffic, no subsequent decryption analysis is required.

解密分析可以包括确认协议种类、切分协议、切分协议域、SSL卸载、有效载荷分析、识别协商协议等，通过对可疑流量的解密分析，进一步确定可疑流量是否为恶意流量，也可以通过解密DPI模块230对与可疑流量对应的流量特征进行标记，将可疑流量特征存储在流量类型确定模块220中，以用于识别待测加密流量中的可疑流量，同时便于获取到更多训练样本用于模型训练，使得加密分析的判断更准确。Decryption analysis can include confirmation of protocol type, segmentation protocol, segmentation protocol domain, SSL offloading, payload analysis, identification and negotiation protocol, etc. Through decryption and analysis of suspicious traffic, it can be further determined whether suspicious traffic is malicious traffic, or through decryption The DPI module 230 marks the traffic characteristics corresponding to the suspicious traffic, and stores the suspicious traffic characteristics in the traffic type determination module 220, so as to identify the suspicious traffic in the encrypted traffic to be tested, and at the same time, it is convenient to obtain more training samples for use Model training makes the judgment of encryption analysis more accurate.

本说明书实施例通过筛选出加密流量中的正常流量与可疑流量，只对可疑流量进行后续解密分析，降低后续分析工作的负载，提高分析效率。In the embodiment of the present specification, by filtering out normal traffic and suspicious traffic in encrypted traffic, only subsequent decryption analysis is performed on suspicious traffic, thereby reducing the load of subsequent analysis work and improving analysis efficiency.

在一些实施例中，基于待测加密流量的加密流量特征，确定待测加密流量的所述流量类型，包括：基于可疑流量识别模型对待测加密流量的加密流量特征进行处理，确定待测加密流量的流量类型，可疑流量识别模型为机器学习模型。In some embodiments, determining the traffic type of the encrypted traffic to be tested based on the encrypted traffic characteristics of the encrypted traffic to be tested includes: processing the encrypted traffic characteristics of the encrypted traffic to be tested based on a suspicious traffic identification model, and determining the encrypted traffic to be tested traffic types, and the suspicious traffic identification model is a machine learning model.

如图5中所示，初始可疑流量识别模型550可以基于大量带有标识的训练样本540，得到训练好的可疑流量识别模型520。具体地，将带有标识的训练样本540输入初始可疑流量识别模型550，基于标识对初始可疑流量识别模型进行训练。在一些实施例中，训练样本540可以是正常流量和可疑流量。As shown in FIG. 5 , the initial suspicious traffic identification model 550 may be based on a large number of labeled training samples 540 to obtain a trained suspicious traffic identification model 520 . Specifically, the training sample 540 with the identification is input into the initial suspicious traffic identification model 550, and the initial suspicious traffic identification model is trained based on the identification. In some embodiments, training samples 540 may be normal traffic and suspicious traffic.

在一些实施例中，训练样本的标识可以是训练样本是否为可疑流量。例如，训练样本为可疑流量则标识为1，反之为0。In some embodiments, the identification of the training samples may be whether the training samples are suspicious traffic. For example, if the training sample is suspicious traffic, it is marked as 1, otherwise it is 0.

在一些实施例中，初始可疑流量识别模型550可以是一个通过将可疑流量作为正样本，正常流量作为负样本训练获得的二分类器。在一些实施例中，二分类器可以是逻辑回归模型、支持向量机、随机森林或其它分类模型中的一种。In some embodiments, the initial suspicious traffic identification model 550 may be a binary classifier trained by taking suspicious traffic as positive samples and normal traffic as negative samples. In some embodiments, the binary classifier may be one of a logistic regression model, a support vector machine, a random forest, or other classification models.

在一些实施例中，可疑流量识别模型520可以用于判别输入的流量特征所对应的流量的类别，在一些实施例中，可疑流量识别模型520的输入可以包括第一流量特征510-1或/和第二流量特征510-2，可疑流量识别模型520的输出可以包括可疑流量530-1和正常流量530-2中的一种。In some embodiments, the suspicious traffic identification model 520 may be used to determine the type of traffic corresponding to the input traffic feature. In some embodiments, the input of the suspicious traffic identification model 520 may include the first traffic feature 510-1 or/ With the second traffic feature 510-2, the output of the suspicious traffic identification model 520 may include one of the suspicious traffic 530-1 and the normal traffic 530-2.

在一些实施例中，当训练的可疑流量识别模型满足预设条件时，训练结束。其中，预设条件可以为准确率大于等于预设阈值。其中，预设阈值可以根据实际需求进行具体设置，例如90％或者95％等。In some embodiments, when the trained suspicious traffic identification model satisfies a preset condition, the training ends. Wherein, the preset condition may be that the accuracy rate is greater than or equal to a preset threshold. Wherein, the preset threshold can be specifically set according to actual needs, such as 90% or 95%.

在一些实施例中，可以通过多个测试样本确定训练的可疑流量识别模型的准确率，测试样本含有是否为可疑流量的标签。将多个测试样本输入训练的可疑流量识别模型之后，可以输出对应的预测类别，当预测类别与标签一致时，则预测正确，反之预测错误。准确率可以为预测正确的样本数除以总的测试样本数得到的值。In some embodiments, multiple test samples may be used to determine the accuracy of the trained suspicious traffic identification model, and the test samples include labels indicating whether the suspicious traffic is suspicious. After multiple test samples are input into the trained suspicious traffic identification model, the corresponding predicted category can be output. When the predicted category is consistent with the label, the prediction is correct, otherwise the prediction is wrong. The accuracy rate can be the value obtained by dividing the number of correct samples by the total number of test samples.

本说明书实施例使用机器学习模型识别流量类型，可以基于大量历史流量数据中学习恶意流量的内在特征，从而更准确地判断待测加密流量是否为可疑流量。The embodiment of this specification uses a machine learning model to identify traffic types, and can learn the inherent characteristics of malicious traffic based on a large amount of historical traffic data, so as to more accurately determine whether the encrypted traffic to be tested is suspicious traffic.

在一些实施例中，可疑流量识别模型的输出还可以包括分类向量530-3，分类向量530-3包括待测加密流量属于不同类别的可疑流量的置信度。In some embodiments, the output of the suspicious traffic identification model may further include a classification vector 530-3, and the classification vector 530-3 includes confidence levels that the encrypted traffic to be tested belongs to different categories of suspicious traffic.

在一些实施例中，在使用可疑流量识别模型输出分类向量之前，应当先使用大量的多分类训练样本对初始可疑流量识别模型进行训练，使其具有一定的多分类能力。在一些实施例中，训练样本可以是正常流量和不同类别的恶意流量，例如，恶意流量可以属于“隐私泄露可疑流量”、“恶意攻击可疑流量”等。在一些实施例中，训练样本的标识可以是训练样本的类别。例如，恶意流量标识为A，表示该恶意流量的类别是“隐私泄露可疑流量”；恶意流量标识为B，表示该恶意流量的类别是恶意攻击可疑流量。在一些实施例中，可疑流量识别模型所输出的分类向量可以代表可疑流量属于不同的恶意行为的置信度。在一些实施例中，可疑流量模型输出的分类向量中包括多个介于0-1之间的数值，分别用于表示样本属于相应类别的置信度。作为一个示例，可疑流量识别模型可以输出一个向量[0.2，0.8，0.1]，其中0.2表示该样本属于A类的置信度为0.2，0.8表示该样本属于B类的置信度为0.8，0.1表示该样本属于C类的置信度为0.1，则可以确定该样本属于类别B。In some embodiments, before using the suspicious traffic identification model to output classification vectors, a large number of multi-classification training samples should be used to train the initial suspicious traffic identification model so that it has a certain multi-classification capability. In some embodiments, the training samples may be normal traffic and different types of malicious traffic, for example, the malicious traffic may belong to "suspicious privacy leakage traffic", "suspicious malicious attack traffic" and so on. In some embodiments, the identification of the training sample may be the class of the training sample. For example, if the malicious traffic is identified as A, it means that the category of the malicious traffic is "suspicious privacy leakage traffic"; if the malicious traffic is marked as B, it means that the category of the malicious traffic is suspicious malicious traffic. In some embodiments, the classification vector output by the suspicious traffic identification model may represent the confidence that the suspicious traffic belongs to different malicious behaviors. In some embodiments, the classification vector output by the suspicious traffic model includes a plurality of values between 0-1, which are respectively used to represent the confidence that the sample belongs to the corresponding category. As an example, the suspicious traffic identification model can output a vector [0.2, 0.8, 0.1], where 0.2 means that the sample belongs to class A with a confidence of 0.2, 0.8 means that the sample belongs to class B with a confidence of 0.8, and 0.1 means that the If the confidence that the sample belongs to category C is 0.1, it can be determined that the sample belongs to category B.

在一些实施例中，可疑流量识别模型的输入还可以包括字节分布概率向量的参考恶意值510-3。字节分布概率向量的确定方式可以参见步骤310及其相关描述。In some embodiments, the input of the suspicious traffic identification model may also include the reference malicious value 510-3 of the byte distribution probability vector. For the method of determining the byte distribution probability vector, refer to step 310 and related descriptions.

参考恶意值是指该字节分布概率向量为可疑流量的可能性。The reference malicious value refers to the possibility that the byte distribution probability vector is suspicious traffic.

在一些实施例中，可以基于历史数据等方式确定字节分布概率向量的参考恶意值。In some embodiments, the reference malicious value of the byte distribution probability vector may be determined based on historical data or the like.

在一些实施例中，基于历史数据确定参考恶意值包括可疑流量确定模型获取历史可疑流量的字节分布概率向量，并将历史可疑流量的字节分布概率向量与待测加密流量对应是字节分布概率向量进行对比，当相似度大于一定阈值(例如，大于0.8)时，将历史可疑流量的字节分布概率向量的恶意值确定为当前字节分布概率向量的参考恶意值。In some embodiments, determining the reference malicious value based on the historical data includes obtaining the byte distribution probability vector of the historical suspicious traffic by the suspicious traffic determination model, and matching the byte distribution probability vector of the historical suspicious traffic with the encrypted traffic to be tested is the byte distribution The probability vectors are compared, and when the similarity is greater than a certain threshold (for example, greater than 0.8), the malicious value of the byte distribution probability vector of historical suspicious traffic is determined as the reference malicious value of the current byte distribution probability vector.

在一些实施例中，关系图谱的边属性还包括字节分布概率向量。In some embodiments, the edge attribute of the relationship graph further includes a byte distribution probability vector.

在一些实施例中，参考恶意值可以基于关系图谱获得，包括：基于关系图谱中满足预设条件的边，统计满足预设条件的边的边属性中的流量类型为恶意流量的边的频率，基于频率确定参考恶意值。In some embodiments, the reference malicious value may be obtained based on the relationship graph, including: based on the edges satisfying the preset condition in the relationship graph, counting the frequency of the edge whose traffic type in the edge attribute of the edge satisfying the preset condition is malicious traffic, A reference malicious value is determined based on frequency.

在一些实施例中，预设条件为边属性中的字节分布概率向量与待测加密流量的字节分布概率向量的相似度接近预设范围。预设范围可以是系统默认值、经验值、人为预先设定值等中的一种。例如，对于字节分布概率向量P2，处理器130可以统计预设时间段内网络流量中P2的100条流量，其中，40条是正常流量，60条是恶意流量，即表示待测加密流量的字节分布概率向量为P2的流量为恶意流量的频率为60％。In some embodiments, the preset condition is that the similarity between the byte distribution probability vector in the edge attribute and the byte distribution probability vector of the encrypted traffic to be tested is close to a preset range. The preset range may be one of system default values, experience values, artificial preset values, and the like. For example, for the byte distribution probability vector P2, the processor 130 can count 100 pieces of flow of P2 in the network flow within a preset period of time, wherein 40 pieces are normal flow and 60 pieces are malicious flow, which means that the encrypted flow to be tested is The frequency of the traffic whose byte distribution probability vector is P2 is malicious traffic is 60%.

参考恶意值可以基于前述的60％来进一步计算，频率越大，参考恶意值越大。在一些实施例中，当前待测流量的字节分布概率向量为P2，在关系图谱中查找与向量P2相似度接近的所有向量。例如，与向量P2相似度接近的所有向量有：P3、P4、P5，其中P3、P4对应的边为恶意流量；P5对应的边为正常流量。那么恶意流量的频率为75％，基于恶意流量的频率75％，计算P2的参考恶意值。根据频率确定参考恶意值的方式可以包括根据规则表确定参考恶意值。例如，字节分布概率向量为恶意流量的频率为60％，则对应规则表中的恶意值为80；字节分布概率向量为恶意流量的频率为80％，则对应规则表中的恶意值为90。The reference malicious value can be further calculated based on the aforementioned 60%, and the greater the frequency, the greater the reference malicious value. In some embodiments, the current byte distribution probability vector of the traffic to be measured is P2, and all vectors with a similarity close to the vector P2 are searched in the relationship graph. For example, all the vectors with close similarity to vector P2 are: P3, P4, P5, where the edges corresponding to P3 and P4 are malicious traffic; the edges corresponding to P5 are normal traffic. Then the frequency of malicious traffic is 75%, and based on the frequency of malicious traffic of 75%, the reference malicious value of P2 is calculated. The manner of determining the reference malicious value according to the frequency may include determining the reference malicious value according to a rule table. For example, if the frequency of the byte distribution probability vector is malicious traffic is 60%, then the malicious value in the corresponding rule table is 80; if the frequency of the byte distribution probability vector is malicious traffic is 80%, then the malicious value in the corresponding rule table is 90.

在本说明书的一些实施例中，可以根据待测加密流量信息和参考恶意值之间的关联关系，更新预设关系图谱，包括：将待测加密流量对应的字节分布概率向量为恶意流量的频率和参考恶意值对应的频率进行比对；若待测加密流量对应的字节分布概率向量为恶意流量的频率大于当前参考恶意值对应的频率，则在待测加密流量上新增子节点并将子节点与代表恶意流量的字节分布概率向量关联，以更新预设关系图谱；若待测加密流量对应的字节分布概率向量为恶意流量的频率小于当前参考恶意值对应的频率，则在待测加密流量上新增子节点并将子节点代表正常流量的字节分布概率向量关联，以更新预设关系图谱。In some embodiments of this specification, the preset relationship graph may be updated according to the correlation between the encrypted traffic information to be tested and the reference malicious value, including: converting the byte distribution probability vector corresponding to the encrypted traffic to be tested into the malicious traffic The frequency is compared with the frequency corresponding to the reference malicious value; if the byte distribution probability vector corresponding to the encrypted traffic to be tested is that the frequency of malicious traffic is greater than the frequency corresponding to the current reference malicious value, a new child node is added to the encrypted traffic to be tested and Associate the child node with the byte distribution probability vector representing malicious traffic to update the preset relationship map; if the frequency of the byte distribution probability vector corresponding to the encrypted traffic to be tested is malicious traffic is less than the frequency corresponding to the current reference malicious value, then in Add new sub-nodes to the encrypted traffic to be tested and associate the sub-nodes with byte distribution probability vectors representing normal traffic to update the preset relationship graph.

本说明书实施例通过关系图谱获取参考恶意值，可以基于大量统计得到的字节分布概率向量，得到更为准确的参考恶意值，同时关系图谱实时更新，可以更准确、高效地实时获取参考恶意值。In the embodiment of this specification, the reference malicious value is obtained through the relationship map, and a more accurate reference malicious value can be obtained based on a large number of statistically obtained byte distribution probability vectors. At the same time, the relationship map is updated in real time, and the reference malicious value can be obtained in real time more accurately and efficiently. .

上文已对基本概念做了描述，显然，对于本领域技术人员来说，上述详细披露仅仅作为示例，而并不构成对本说明书的限定。虽然此处并没有明确说明，本领域技术人员可能会对本说明书进行各种修改、改进和修正。该类修改、改进和修正在本说明书中被建议，所以该类修改、改进、修正仍属于本说明书示范实施例的精神和范围。The basic concept has been described above, obviously, for those skilled in the art, the above detailed disclosure is only an example, and does not constitute a limitation to this description. Although not expressly stated here, those skilled in the art may make various modifications, improvements and corrections to this description. Such modifications, improvements and corrections are suggested in this specification, so such modifications, improvements and corrections still belong to the spirit and scope of the exemplary embodiments of this specification.

同时，本说明书使用了特定词语来描述本说明书的实施例。如“一个实施例”、“一实施例”、和/或“一些实施例”意指与本说明书至少一个实施例相关的某一特征、结构或特点。因此，应强调并注意的是，本说明书中在不同位置两次或多次提及的“一实施例”或“一个实施例”或“一个替代性实施例”并不一定是指同一实施例。此外，本说明书的一个或多个实施例中的某些特征、结构或特点可以进行适当的组合。Meanwhile, this specification uses specific words to describe the embodiments of this specification. For example, "one embodiment", "an embodiment", and/or "some embodiments" refer to a certain feature, structure or characteristic related to at least one embodiment of this specification. Therefore, it should be emphasized and noted that references to "an embodiment" or "an embodiment" or "an alternative embodiment" two or more times in different places in this specification do not necessarily refer to the same embodiment . In addition, certain features, structures or characteristics in one or more embodiments of this specification may be properly combined.

此外，除非权利要求中明确说明，本说明书所述处理元素和序列的顺序、数字字母的使用、或其他名称的使用，并非用于限定本说明书流程和方法的顺序。尽管上述披露中通过各种示例讨论了一些目前认为有用的发明实施例，但应当理解的是，该类细节仅起到说明的目的，附加的权利要求并不仅限于披露的实施例，相反，权利要求旨在覆盖所有符合本说明书实施例实质和范围的修正和等价组合。例如，虽然以上所描述的系统组件可以通过硬件设备实现，但是也可以只通过软件的解决方案得以实现，如在现有的服务器或移动设备上安装所描述的系统。In addition, unless explicitly stated in the claims, the order of processing elements and sequences described in this specification, the use of numbers and letters, or the use of other names are not used to limit the sequence of processes and methods in this specification. While the foregoing disclosure has discussed by way of various examples some embodiments of the invention that are presently believed to be useful, it should be understood that such detail is for illustrative purposes only and that the appended claims are not limited to the disclosed embodiments, but rather, the claims The claims are intended to cover all modifications and equivalent combinations that fall within the spirit and scope of the embodiments of this specification. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by a software-only solution, such as installing the described system on an existing server or mobile device.

同理，应当注意的是，为了简化本说明书披露的表述，从而帮助对一个或多个发明实施例的理解，前文对本说明书实施例的描述中，有时会将多种特征归并至一个实施例、附图或对其的描述中。但是，这种披露方法并不意味着本说明书对象所需要的特征比权利要求中提及的特征多。实际上，实施例的特征要少于上述披露的单个实施例的全部特征。In the same way, it should be noted that in order to simplify the expression disclosed in this specification and help the understanding of one or more embodiments of the invention, in the foregoing description of the embodiments of this specification, sometimes multiple features are combined into one embodiment, drawings or descriptions thereof. This method of disclosure does not, however, imply that the subject matter of the specification requires more features than are recited in the claims. Indeed, embodiment features are less than all features of a single foregoing disclosed embodiment.

一些实施例中使用了描述成分、属性数量的数字，应当理解的是，此类用于实施例描述的数字，在一些示例中使用了修饰词“大约”、“近似”或“大体上”来修饰。除非另外说明，“大约”、“近似”或“大体上”表明所述数字允许有±20％的变化。相应地，在一些实施例中，说明书和权利要求中使用的数值参数均为近似值，该近似值根据个别实施例所需特点可以发生改变。在一些实施例中，数值参数应考虑规定的有效数位并采用一般位数保留的方法。尽管本说明书一些实施例中用于确认其范围广度的数值域和参数为近似值，在具体实施例中，此类数值的设定在可行范围内尽可能精确。In some embodiments, numbers describing the quantity of components and attributes are used. It should be understood that such numbers used in the description of the embodiments use the modifiers "about", "approximately" or "substantially" in some examples. grooming. Unless otherwise stated, "about", "approximately" or "substantially" indicates that the stated figure allows for a variation of ±20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that can vary depending upon the desired characteristics of individual embodiments. In some embodiments, numerical parameters should take into account the specified significant digits and adopt the general digit reservation method. Although the numerical ranges and parameters used in some embodiments of this specification to confirm the breadth of the range are approximations, in specific embodiments, such numerical values are set as precisely as practicable.

针对本说明书引用的每个专利、专利申请、专利申请公开物和其他材料，如文章、书籍、说明书、出版物、文档等，特此将其全部内容并入本说明书作为参考。与本说明书内容不一致或产生冲突的申请历史文件除外，对本说明书权利要求最广范围有限制的文件(当前或之后附加于本说明书中的)也除外。需要说明的是，如果本说明书附属材料中的描述、定义、和/或术语的使用与本说明书所述内容有不一致或冲突的地方，以本说明书的描述、定义和/或术语的使用为准。Each patent, patent application, patent application publication, and other material, such as article, book, specification, publication, document, etc., cited in this specification is hereby incorporated by reference in its entirety. Application history documents that are inconsistent with or conflict with the content of this specification are excluded, and documents (currently or later appended to this specification) that limit the broadest scope of the claims of this specification are also excluded. It should be noted that if there is any inconsistency or conflict between the descriptions, definitions, and/or terms used in the accompanying materials of this manual and the contents of this manual, the descriptions, definitions and/or terms used in this manual shall prevail .

最后，应当理解的是，本说明书中所述实施例仅用以说明本说明书实施例的原则。其他的变形也可能属于本说明书的范围。因此，作为示例而非限制，本说明书实施例的替代配置可视为与本说明书的教导一致。相应地，本说明书的实施例不仅限于本说明书明确介绍和描述的实施例。Finally, it should be understood that the embodiments described in this specification are only used to illustrate the principles of the embodiments of this specification. Other modifications are also possible within the scope of this description. Therefore, by way of example and not limitation, alternative configurations of the embodiments of this specification may be considered consistent with the teachings of this specification. Accordingly, the embodiments of this specification are not limited to the embodiments explicitly introduced and described in this specification.

Claims

1. A method for judging suspicious traffic in encrypted traffic is characterized by comprising the following steps:

acquiring encrypted flow to be detected, and extracting encrypted flow characteristics of the encrypted flow to be detected; wherein the encrypted traffic characteristics include first traffic characteristics including access characteristic information, protocol characteristic information, and transfer characteristic information;

determining the traffic type of the encrypted traffic to be detected based on the encrypted traffic characteristics of the encrypted traffic to be detected, wherein the traffic type comprises normal traffic and the suspicious traffic;

and in response to the fact that the flow type of the encrypted flow to be detected is the suspicious flow, performing subsequent decryption analysis on the encrypted flow to be detected through a decryption DPI module.

2. The method of claim 1, wherein the encrypted traffic characteristics further comprise a second traffic characteristic, the second traffic characteristic being obtained via a relational graph.

3. The method of claim 1, wherein the determining the traffic type of the encrypted traffic under test based on the encrypted traffic characteristics of the encrypted traffic under test comprises:

and processing the encrypted flow characteristics of the encrypted flow to be detected based on a suspicious flow identification model, and determining the flow type of the encrypted flow to be detected, wherein the suspicious flow identification model is a machine learning model.

4. The method of claim 3, wherein the inputs to the suspicious traffic identification model further comprise reference malicious values of byte distributed probability vectors, the reference malicious values obtained based on the relationship graph.

5. A system for determining suspicious traffic in encrypted traffic, the system comprising:

the flow characteristic acquisition module is used for acquiring encrypted flow to be detected and extracting encrypted flow characteristics of the encrypted flow to be detected; wherein the encrypted traffic characteristics include first traffic characteristics including access characteristic information, protocol characteristic information, and transfer characteristic information;

a traffic type determining module, configured to determine a traffic type of the encrypted traffic to be detected based on the encrypted traffic feature of the encrypted traffic to be detected, where the traffic type includes a normal traffic and the suspicious traffic;

and the decryption DPI module is used for responding to the suspicious traffic of the traffic type of the encrypted traffic to be detected and carrying out subsequent decryption analysis on the encrypted traffic to be detected.

6. The system of claim 5, wherein the encrypted traffic characteristics further comprise a second traffic characteristic, the second traffic characteristic being obtained via a relational map.

7. The system of claim 5, the traffic type determination module further to:

8. The system of claim 7, wherein the inputs to the suspicious traffic identification model further comprise reference malice values of byte distribution probability vectors, the reference malice values obtained based on the relationship graph.

9. A device for judging suspicious traffic in encrypted traffic comprises at least one processor and at least one memory; the at least one memory is for storing computer instructions; the at least one processor is configured to execute at least some of the computer instructions to implement the method of any of claims 1-4.

10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the method of any one of claims 1 to 4.