[go: up one dir, main page]

CN109995605A - A traffic identification method and device, and a computer-readable storage medium - Google Patents

A traffic identification method and device, and a computer-readable storage medium Download PDF

Info

Publication number
CN109995605A
CN109995605A CN201810000615.2A CN201810000615A CN109995605A CN 109995605 A CN109995605 A CN 109995605A CN 201810000615 A CN201810000615 A CN 201810000615A CN 109995605 A CN109995605 A CN 109995605A
Authority
CN
China
Prior art keywords
app
traffic
feature
data packet
traffic data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810000615.2A
Other languages
Chinese (zh)
Other versions
CN109995605B (en
Inventor
熊龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communication Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communication Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201810000615.2A priority Critical patent/CN109995605B/en
Publication of CN109995605A publication Critical patent/CN109995605A/en
Application granted granted Critical
Publication of CN109995605B publication Critical patent/CN109995605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/14Arrangements for monitoring or testing data switching networks using software, i.e. software packages

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种流量识别方法,包括:获取电子设备被使用时所产生的待识别流量数据包;对所述待识别流量数据包进行特征提取,获得第一内容特征;将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;所述第二内容特征为从App的有效流量样本中提取的内容特征,所述有效流量样本包括App调用本地数据库所产生的流量数据包;根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App;所述第二行为特征表征App对第三方数据库的调用行为特征。本发明同时还公开了一种流量识别装置以及计算机可读存储介质。

The invention discloses a traffic identification method, which comprises: acquiring to-be-identified traffic data packets generated when an electronic device is used; performing feature extraction on the to-be-identified traffic data packets to obtain first content features; The content feature is matched with the second content feature of each App in the pre-built feature database to determine at least one App associated with the traffic data packet to be identified; the second content feature is the content extracted from the valid traffic samples of the App feature, the valid traffic samples include traffic data packets generated by an App calling a local database; according to the second behavioral feature of the at least one App in the feature database, determine the traffic data to be identified from the at least one App The App to which the package belongs; the second behavioral feature represents the calling behavioral feature of the App to the third-party database. The invention also discloses a flow identification device and a computer-readable storage medium.

Description

一种流量识别方法及装置、以及计算机可读存储介质A traffic identification method and device, and a computer-readable storage medium

技术领域technical field

本发明涉及通信技术,尤其涉及一种流量识别方法及装置、以及计算机可读存储介质。The present invention relates to communication technologies, and in particular, to a method and device for identifying traffic, and a computer-readable storage medium.

背景技术Background technique

相关研究分析中发现,人们对于移动互联网设备的依赖性越来越严重,平均每人每天花费在移动电子设备上的时间超过三个小时;其中,花费在智能手机上的时间占了将近两个小时,花费的时间主要集中在智能手机安装的应用程序(App,Application)的使用上。并且,用户对App的使用频率通常与用户的身份有一定的关联性,例如,经常使用金融证券相关App的用户,其身份很有可能就是证券经理、个人或企业投资者;因此,App所对应的属性在某种程度上就反应着用户的个人属性。Related research analysis found that people's dependence on mobile Internet devices is becoming more and more serious, and the average person spends more than three hours a day on mobile electronic devices; among them, the time spent on smartphones accounts for nearly two hours. hours, the time spent is mainly focused on the use of applications (App, Application) installed on the smartphone. In addition, the frequency of users' use of apps is usually related to the user's identity. For example, users who frequently use financial securities-related apps are likely to be securities managers, individuals or corporate investors; The attributes reflect the user's personal attributes to some extent.

基于上述的可能性,如果能够可靠地得到用户的App使用行为,并对得到的App使用行为进行分析画像,就能得到具有较高可信度的用户描述。但是,用户的App使用行为属于用户隐私,并且智能手机上App的种类繁多,目前用户常用的App数量,就超过了10万个;因此,常规方法是无法可靠地获取到用户的App使用行为样本。然而,对于移动数据供应商而言,可以获取到用户使用App的流量数据;如果能根据流量数据识别出用户所使用的App,则对于可靠地获取用户的App使用行为就成为了可能。Based on the above possibilities, if the user's App usage behavior can be reliably obtained, and the obtained App usage behavior can be analyzed and profiled, a user description with high reliability can be obtained. However, the user's app usage behavior belongs to the user's privacy, and there are many types of apps on smartphones. Currently, the number of apps commonly used by users exceeds 100,000; therefore, the conventional method cannot reliably obtain the user's app usage behavior samples. . However, for the mobile data provider, it is possible to obtain the traffic data of the user's app using; if the app used by the user can be identified according to the traffic data, it becomes possible to reliably obtain the user's app usage behavior.

为了根据用户使用App的流量数据识别出用户所使用的App,首先,需要获取App的流量样本;然后,对获得的流量样本进行特征提取,以根据提取的特征构建特征数据库;最后,基于构建的特征数据库对用户使用App的流量数据进行特征匹配,以识别出用户所使用的App。从上述描述可以看出,如何对获得的流量数据进行特征提取,使得提取的特征具有更加优秀的粒度以及抗干扰能力是准确识别App的关键因素。然而,采用现有的流量特征提取方法提取的特征存在识别粒度不够,以及在噪声存在的条件下识别效果不理想的缺陷,使得根据流量数据无法准确识别到用户所使用的App,从而导致流量识别精度较差。In order to identify the App used by the user according to the traffic data of the user using the App, first, the traffic samples of the App need to be obtained; then, feature extraction is performed on the obtained traffic samples to construct a feature database according to the extracted features; finally, based on the constructed The feature database performs feature matching on the traffic data of the app used by the user to identify the app used by the user. It can be seen from the above description that how to perform feature extraction on the obtained traffic data so that the extracted features have better granularity and anti-interference ability are the key factors to accurately identify apps. However, the features extracted by the existing traffic feature extraction method have the defects of insufficient recognition granularity and unsatisfactory recognition effect under the condition of noise, so that the App used by the user cannot be accurately identified according to the traffic data, resulting in traffic identification. Poor accuracy.

发明内容SUMMARY OF THE INVENTION

为解决现有存在的技术问题,本发明实施例提供一种流量识别方法及装置、以及计算机可读存储介质,能够根据用户使用电子设备时所产生的流量数据包准确识别到用户所使用的App。In order to solve the existing technical problems, the embodiments of the present invention provide a traffic identification method and device, and a computer-readable storage medium, which can accurately identify the App used by the user according to the traffic data packets generated when the user uses an electronic device. .

本发明实施例的技术方案是这样实现的:The technical solution of the embodiment of the present invention is realized as follows:

本发明实施例提供了一种流量识别方法,所述方法包括:An embodiment of the present invention provides a traffic identification method, and the method includes:

获取电子设备被使用时所产生的待识别流量数据包;Obtain the traffic data packets to be identified generated when the electronic device is used;

对所述待识别流量数据包进行特征提取,获得第一内容特征;所述第一内容特征表征待识别流量数据包关联的应用程序App的内容特征;Perform feature extraction on the traffic data packet to be identified to obtain a first content feature; the first content feature represents the content feature of an application program App associated with the traffic data packet to be identified;

将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;所述第二内容特征为从App的有效流量样本中提取的内容特征,所述有效流量样本包括App调用本地数据库所产生的流量数据包;Match the first content feature with the second content feature of each App in the pre-built feature database, and determine at least one App associated with the traffic data packet to be identified; the second content feature is the valid traffic from the App The content feature extracted from the sample, the effective flow sample includes the flow data packet generated by the App calling the local database;

根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App;所述第二行为特征表征App对第三方数据库的调用行为特征。According to the second behavior feature of the at least one App in the feature database, the App to which the traffic data packet to be identified belongs is determined from the at least one App; the second behavior feature represents the call of the App to the third-party database Behavioral characteristics.

上述方案中,所述方法还包括:In the above scheme, the method also includes:

针对每个App,获取相应的原生流量,所述原生流量为相应App在模拟环境运行过程中所产生的流量;For each App, obtain the corresponding native traffic, where the native traffic is the traffic generated by the corresponding App during the running process of the simulated environment;

对所述原生流量进行过滤,获得所述相应App的有效流量样本;Filtering the native traffic to obtain valid traffic samples of the corresponding App;

对所述有效流量样本进行内容特征提取,获得所述相应App的第二内容特征;Perform content feature extraction on the valid traffic sample to obtain the second content feature of the corresponding App;

根据预先获取的App的第二内容特征和所述原生流量,获得所述相应App的第二行为特征;Obtain the second behavioral feature of the corresponding App according to the pre-acquired second content feature of the App and the native traffic;

利用各App的第二内容特征和第二行为特征,构建所述特征数据库。The feature database is constructed by using the second content feature and the second behavior feature of each App.

上述方案中,所述对所述原生流量进行过滤,获得所述相应App的有效流量样本,包括:In the above solution, the filtering of the native traffic to obtain valid traffic samples of the corresponding App includes:

对所述App的第N个特征信息进行语义分析,获得第一语义信息,所述N为正整数;Semantic analysis is performed on the Nth feature information of the App to obtain the first semantic information, and the N is a positive integer;

对待分析流量数据包中与所述第N个特征信息对应的内容进行语义分析,获得所述待分析流量数据包的第二语义信息;所述待分析流量数据包为所述原生流量中通过N-1个特征信息进行过滤后未确定为有效流量数据包的流量数据包;所述有效流量数据包为所述有效流量样本中的流量数据包;Perform semantic analysis on the content corresponding to the Nth feature information in the traffic data packet to be analyzed, and obtain the second semantic information of the traffic data packet to be analyzed; the traffic data packet to be analyzed is the original traffic through N -1 traffic data packet that is not determined to be an effective traffic data packet after filtering the characteristic information; the effective traffic data packet is the traffic data packet in the effective traffic sample;

计算所述第一语义信息与所述第二语义信息之间的关联度;calculating the degree of association between the first semantic information and the second semantic information;

将所述关联度大于或等于预设值的待分析流量数据包确定为有效流量样本中的流量数据包。The traffic data packets to be analyzed whose correlation degree is greater than or equal to the preset value are determined as the traffic data packets in the valid traffic samples.

上述方案中,所述根据预先获取的App的第二内容特征和所述原生流量,获得所述相应App的第二行为特征,包括:In the above solution, the second behavioral feature of the corresponding App is obtained according to the second content feature of the app obtained in advance and the native traffic, including:

对所述原生流量进行内容特征提取,获得所述原生流量的内容特征;performing content feature extraction on the native traffic to obtain content features of the native traffic;

将所述原生流量的内容特征和预先获取的App的第二内容特征进行匹配,确定所述原生流量关联的App;Matching the content feature of the native traffic with the second content feature of the pre-acquired App to determine the App associated with the native traffic;

根据所述App的描述信息和所述原生流量关联的App,获得所述第二行为特征。The second behavior characteristic is obtained according to the description information of the App and the App associated with the native traffic.

上述方案中,所述根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App,包括:In the above solution, the App to which the to-be-identified traffic data packet belongs is determined from the at least one App according to the second behavioral feature of the at least one App in the feature database, including:

从所述至少一个App的第二行为特征中获取目标行为特征,所述目标行为特征中的调用行为关联的App中包括所述待识别流量数据包关联的所有App;Obtain the target behavior feature from the second behavior feature of the at least one App, and the App associated with the calling behavior in the target behavior feature includes all the Apps associated with the traffic data packet to be identified;

从所述特征数据库中获取所述目标行为特征对应的App;Obtain the App corresponding to the target behavior feature from the feature database;

将所述目标行为特征对应的App确定为所述待识别流量数据包所属的App。The App corresponding to the target behavior feature is determined as the App to which the to-be-identified traffic data packet belongs.

本发明实施例还提供了一种流量识别装置,所述装置包括:处理器、用于存储能够在处理器上运行的计算机程序的存储器;其中,An embodiment of the present invention further provides a device for identifying traffic, the device comprising: a processor, and a memory for storing a computer program that can run on the processor; wherein,

所述处理器用于运行所述计算机程序时,执行:When the processor is configured to run the computer program, execute:

获取电子设备被使用时所产生的待识别流量数据包;Obtain the traffic data packets to be identified generated when the electronic device is used;

对所述待识别流量数据包进行特征提取,获得第一内容特征;所述第一内容特征表征待识别流量数据包关联的应用程序App的内容特征;Perform feature extraction on the traffic data packet to be identified to obtain a first content feature; the first content feature represents the content feature of an application program App associated with the traffic data packet to be identified;

将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;所述第二内容特征为从App的有效流量样本中提取的内容特征,所述有效流量样本包括App调用本地数据库所产生的流量数据包;Match the first content feature with the second content feature of each App in the pre-built feature database, and determine at least one App associated with the traffic data packet to be identified; the second content feature is the valid traffic from the App The content feature extracted from the sample, the effective flow sample includes the flow data packet generated by the App calling the local database;

根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App;所述第二行为特征表征App对第三方数据库的调用行为特征。According to the second behavior feature of the at least one App in the feature database, the App to which the traffic data packet to be identified belongs is determined from the at least one App; the second behavior feature represents the call of the App to the third-party database Behavioral characteristics.

上述方案中,所述处理器用于运行所述计算机程序时,执行:In the above solution, when the processor is used to run the computer program, execute:

针对每个App,获取相应的原生流量,所述原生流量为相应App在模拟环境运行过程中所产生的流量;For each App, obtain the corresponding native traffic, where the native traffic is the traffic generated by the corresponding App during the running process of the simulated environment;

对所述原生流量进行过滤,获得所述相应App的有效流量样本;Filtering the native traffic to obtain valid traffic samples of the corresponding App;

对所述有效流量样本进行内容特征提取,获得所述相应App的第二内容特征;Perform content feature extraction on the valid traffic sample to obtain the second content feature of the corresponding App;

根据预先获取的App的第二内容特征和所述原生流量,获得所述相应App的第二行为特征;Obtain the second behavioral feature of the corresponding App according to the pre-acquired second content feature of the App and the native traffic;

利用各App的第二内容特征和第二行为特征,构建所述特征数据库。The feature database is constructed by using the second content feature and the second behavior feature of each App.

上述方案中,所述处理器用于运行所述计算机程序时,执行:In the above solution, when the processor is used to run the computer program, execute:

对所述App的第N个特征信息进行语义分析,获得第一语义信息,所述N为正整数;Semantic analysis is performed on the Nth feature information of the App to obtain the first semantic information, and the N is a positive integer;

对待分析流量数据包中与所述第N个特征信息对应的内容进行语义分析,获得所述待分析流量数据包的第二语义信息;所述待分析流量数据包为所述原生流量中通过N-1个特征信息进行过滤后未确定为有效流量数据包的流量数据包;所述有效流量数据包为所述有效流量样本中的流量数据包;Perform semantic analysis on the content corresponding to the Nth feature information in the traffic data packet to be analyzed, and obtain the second semantic information of the traffic data packet to be analyzed; the traffic data packet to be analyzed is the original traffic through N -1 traffic data packet that is not determined to be an effective traffic data packet after filtering the characteristic information; the effective traffic data packet is the traffic data packet in the effective traffic sample;

计算所述第一语义信息与所述第二语义信息之间的关联度;calculating the degree of association between the first semantic information and the second semantic information;

将所述关联度大于或等于预设值的待分析流量数据包确定为有效流量样本中的流量数据包。The traffic data packets to be analyzed whose correlation degree is greater than or equal to the preset value are determined as the traffic data packets in the valid traffic samples.

上述方案中,所述处理器用于运行所述计算机程序时,执行:In the above solution, when the processor is used to run the computer program, execute:

对所述原生流量进行内容特征提取,获得所述原生流量的内容特征;performing content feature extraction on the native traffic to obtain content features of the native traffic;

将所述原生流量的内容特征和预先获取的App的第二内容特征进行匹配,确定所述原生流量关联的App;Matching the content feature of the native traffic with the second content feature of the pre-acquired App to determine the App associated with the native traffic;

根据所述App的描述信息和所述原生流量关联的App,获得所述第二行为特征。The second behavior characteristic is obtained according to the description information of the App and the App associated with the native traffic.

上述方案中,所述处理器用于运行所述计算机程序时,执行:In the above solution, when the processor is used to run the computer program, execute:

从所述至少一个App的第二行为特征中获取目标行为特征,所述目标行为特征中的调用行为关联的App中包括所述待识别流量数据包关联的所有App;Obtain the target behavior feature from the second behavior feature of the at least one App, and the App associated with the calling behavior in the target behavior feature includes all the Apps associated with the traffic data packet to be identified;

从所述特征数据库中获取所述目标行为特征对应的App;Obtain the App corresponding to the target behavior feature from the feature database;

将所述目标行为特征对应的App确定为所述待识别流量数据包所属的App。The App corresponding to the target behavior feature is determined as the App to which the to-be-identified traffic data packet belongs.

本发明实施例还提供了一种流量识别装置,所述装置包括:获取模块、特征提取模块、第一确定模块和第二确定模块;其中,An embodiment of the present invention further provides a traffic identification device, the device includes: an acquisition module, a feature extraction module, a first determination module and a second determination module; wherein,

所述获取模块,用于获取电子设备被使用时所产生的待识别流量数据包;The acquisition module is used to acquire the traffic data packets to be identified generated when the electronic device is used;

所述特征提取模块,用于对所述待识别流量数据包进行特征提取,获得第一内容特征;所述第一内容特征表征待识别流量数据包关联的应用程序App的内容特征;The feature extraction module is configured to perform feature extraction on the traffic data packet to be identified to obtain a first content feature; the first content feature represents the content feature of an application program associated with the traffic data packet to be identified;

所述第一确定模块,用于将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;所述第二内容特征为从App的有效流量样本中提取的内容特征,所述有效流量样本包括App调用本地数据库所产生的流量数据包;The first determining module is configured to match the first content feature with the second content feature of each App in the pre-built feature database, and determine at least one App associated with the traffic data packet to be identified; the first The second content feature is the content feature extracted from the valid flow sample of the App, and the valid flow sample includes the flow data packet generated by the App calling the local database;

所述第二确定模块,用于根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App;所述第二行为特征表征App对第三方数据库的调用行为特征。The second determination module is configured to determine the App to which the traffic data packet to be identified belongs from the at least one App according to the second behavior feature of the at least one App in the feature database; the second behavior The characteristics represent the behavior characteristics of the app calling the third-party database.

本发明实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机可执行指令,所述计算机可执行指令的计算机程序被处理器执行时实现上述流量识别方法。Embodiments of the present invention further provide a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when a computer program of the computer-executable instructions is executed by a processor, the foregoing traffic identification method is implemented.

本发明实施例提供的流量识别方法及装置,首先,获取电子设备被使用时所产生的待识别流量数据包;然后,对所述待识别流量数据包进行特征提取,获得第一内容特征;所述第一内容特征表征待识别流量数据包关联的应用程序App的内容特征;将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;所述第二内容特征为从App的有效流量样本中提取的内容特征,所述有效流量样本包括App调用本地数据库所产生的流量数据包;最后,根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App;所述第二行为特征表征App对第三方数据库的调用行为特征。In the traffic identification method and device provided by the embodiments of the present invention, firstly, the to-be-identified traffic data packet generated when the electronic device is used is obtained; then, feature extraction is performed on the to-be-identified traffic data packet to obtain the first content feature; The first content feature represents the content feature of the application program App associated with the traffic data packet to be identified; the first content feature is matched with the second content feature of each App in the pre-built feature database, and the traffic to be identified is determined. At least one App associated with the data package; the second content feature is the content feature extracted from the valid flow sample of the App, and the valid flow sample includes the flow data packet generated by the App calling the local database; finally, according to the feature The second behavior feature of the at least one App in the database is to determine the App to which the to-be-identified traffic data packet belongs from the at least one App; the second behavior feature represents the calling behavior feature of the App to the third-party database.

由于本发明实施例中预先构建的特征数据库中不仅包括各App的第二内容特征,而且还包括各App的第二行为特征,该第二内容特征为从App调用本地数据库所产生的流量数据包中提取的内容特征,该第二行为特征表征App对第三方数据库的调用行为特征;因此,基于本地库流量和外部库流量,对App的流量数据包提取了不同类型的特征,使得特征数据库中的特征具有更加优秀的粒度以及更强的抗干扰能力,从而能够提高特征识别的粒度,进而能够大大提高流量识别的精度。并且,本发明实施例的流量识别方法具有较强的鲁棒性,能够处理不同App在流量层面之间的差异性,能够通用化地识别出绝大多数App的流量。Because the pre-built feature database in the embodiment of the present invention not only includes the second content feature of each App, but also includes the second behavior feature of each App, the second content feature is the traffic data packet generated by calling the local database from the App The second behavior feature represents the calling behavior feature of the App to the third-party database; therefore, based on the local library traffic and external library traffic, different types of features are extracted from the App’s traffic data packets, so that the feature database The features have better granularity and stronger anti-interference ability, which can improve the granularity of feature identification and greatly improve the accuracy of traffic identification. In addition, the traffic identification method of the embodiment of the present invention has strong robustness, can deal with the differences between different apps in the traffic level, and can generally identify the traffic of most apps.

附图说明Description of drawings

在附图(其不一定是按比例绘制的)中,相似的附图标记可在不同的视图中描述相似的部件。具有不同字母后缀的相似附图标记可表示相似部件的不同示例。附图以示例而非限制的方式大体示出了本文中所讨论的各个实施例。In the drawings, which are not necessarily to scale, like reference numerals may describe like parts in the different views. Similar reference numbers with different letter suffixes may denote different instances of similar components. The accompanying drawings generally illustrate, by way of example and not limitation, the various embodiments discussed herein.

图1为本发明实施例一中流量识别方法的实现流程示意图;FIG. 1 is a schematic flowchart of the implementation of the traffic identification method in Embodiment 1 of the present invention;

图2为本发明实施例二中构建特征数据库的具体实现流程示意图;2 is a schematic diagram of a specific implementation flow of constructing a feature database in Embodiment 2 of the present invention;

图3为图2所示实现流程中步骤204的具体实现流程示意图;3 is a schematic diagram of a specific implementation flow of step 204 in the implementation process shown in FIG. 2;

图4为本发明实施例流量识别装置组成结构示意图;4 is a schematic structural diagram of the composition of a flow identification device according to an embodiment of the present invention;

图5为图4所述装置中第三获取模块的具体组成结构示意图;5 is a schematic diagram of a specific composition structure of a third acquisition module in the device shown in FIG. 4;

图6为本发明实施例流量装置硬件组成结构示意图。FIG. 6 is a schematic structural diagram of a hardware composition of a flow device according to an embodiment of the present invention.

具体实施方式Detailed ways

通常,当前App调用外部数据库文件现象普遍,主要表现为应用内部注入广告、以及调用其他App接口如软件开发工具包(SDK,Software Development Kit)接口和应用程序编程(API,Application Programming Interface)接口等,这些外部数据库文件的调用,也会在流量数据包中有所体现。Usually, it is common for the current App to call external database files, which is mainly manifested in the injection of advertisements inside the application, and the call of other App interfaces such as Software Development Kit (SDK, Software Development Kit) interface and Application Programming Interface (API, Application Programming Interface) interface, etc. , the calls of these external database files will also be reflected in the traffic packets.

由于App调用外部数据库文件所产生的流量即外部库流量,体现外部特征,而App调用本地数据库文件所产生的流量即本地库流量,体现App的自身特征(即本地特征)。并且同一库文件,对于不同的App,可以表现有不同属性;例如:淘宝App的库文件,对于淘宝App而言是本地数据库文件,对于美团App而言是外部数据库文件。因此,这对于提取流量数据包的本地特征以及根据特征来识别流量数据包所属的App产生了严重的负面影响;使得在对流量数据包进行识别时,仅凭流量数据包的内容无法准确定位到流量数据包所属的App。Because the traffic generated by the app calling the external database file is the external library traffic, which reflects the external characteristics, and the traffic generated by the app calling the local database file is the local library traffic, which reflects the app's own characteristics (ie local characteristics). And the same library file can have different properties for different apps; for example, the library file of Taobao App is a local database file for Taobao App, and it is an external database file for Meituan App. Therefore, this has a serious negative impact on extracting the local characteristics of the traffic data packets and identifying the App to which the traffic data packets belong according to the characteristics; when identifying the traffic data packets, the content of the traffic data packets alone cannot be accurately located. The App to which the traffic packet belongs.

例如:多款网络应用的内部会注入淘宝广告,这些淘宝广告也会产生含有淘宝App的内容特征的流量(如:某大型网络游戏的流量相关字段中携带有taobao特征字)。那么,在对含有淘宝广告的App的流量数据包进行特征提取时,就会有许多噪声干扰,使得无法准确识别到该流量数据包中哪些是本地库流量,哪些是外部库流量,从而无法准确定位到该流量数据包所属的APP。For example, Taobao advertisements will be injected into the interior of many network applications, and these Taobao advertisements will also generate traffic with the content characteristics of Taobao App (for example, the traffic-related fields of a large-scale online game carry the taobao characteristic word). Then, when the feature extraction is performed on the traffic data packets of the App containing Taobao advertisements, there will be a lot of noise interference, which makes it impossible to accurately identify which traffic data packets are local library traffic and which are external library traffic, so it is impossible to accurately identify which ones are local library traffic and which are external library traffic. Locate the APP to which the traffic data packet belongs.

基于此,在本发明实施例中:首先,本发明实施例提供的流量识别方法及装置,首先,获取电子设备被使用时所产生的待识别流量数据包;然后,对所述待识别流量数据包进行特征提取,获得第一内容特征;所述第一内容特征表征待识别流量数据包关联的应用程序App的内容特征;将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;所述第二内容特征为从App的有效流量样本中提取的内容特征,所述有效流量样本包括App调用本地数据库所产生的流量数据包;最后,根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App;所述第二行为特征表征App对第三方数据库的调用行为特征。Based on this, in the embodiment of the present invention: firstly, in the traffic identification method and device provided by the embodiment of the present invention, firstly, the to-be-identified traffic data packet generated when the electronic device is used is obtained; then, the to-be-identified traffic data packet is obtained; Feature extraction is performed on the package to obtain the first content feature; the first content feature represents the content feature of the application program App associated with the traffic data packet to be identified; the first content feature is compared with the first content feature of each App in the pre-built feature database. Two content features are matched to determine at least one App associated with the to-be-identified traffic data packet; the second content feature is a content feature extracted from a valid traffic sample of the App, and the valid traffic sample includes the data generated by the App calling the local database. The generated traffic data package; finally, according to the second behavioral feature of the at least one App in the feature database, determine the App to which the traffic data package to be identified belongs from the at least one App; the second behavioral feature Characterize the calling behavior characteristics of the App to the third-party database.

由于本发明实施例中预先构建的特征数据库中不仅包括各App的第二内容特征,而且还包括各App的第二行为特征,该第二内容特征为从App调用本地数据库所产生的流量数据包中提取的内容特征,该第二行为特征表征App对第三方数据库的调用行为特征;因此,基于本地库流量和外部库流量,对App的流量数据包提取了不同类型的特征,使得特征数据库中的特征具有更加优秀的粒度以及更强的抗干扰能力,从而能够提高特征识别的粒度,进而能够大大提高流量识别的精度。并且,本发明实施例的流量识别方法具有较强的鲁棒性,能够处理不同App在流量层面之间的差异性,能够通用化地识别出绝大多数App的流量。Because the pre-built feature database in the embodiment of the present invention not only includes the second content feature of each App, but also includes the second behavior feature of each App, the second content feature is the traffic data packet generated by calling the local database from the App The second behavior feature represents the calling behavior feature of the App to the third-party database; therefore, based on the local library traffic and external library traffic, different types of features are extracted from the App’s traffic data packets, so that the feature database The features have better granularity and stronger anti-interference ability, which can improve the granularity of feature identification and greatly improve the accuracy of traffic identification. In addition, the traffic identification method of the embodiment of the present invention has strong robustness, can deal with the differences between different apps in the traffic level, and can generally identify the traffic of most apps.

下面结合附图及实施例对本发明再作进一步详细的描述。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

本发明实施例提供的流量识别方法,应用于电子设备,如图1所示,包括以下步骤:The traffic identification method provided by the embodiment of the present invention is applied to an electronic device, as shown in FIG. 1 , and includes the following steps:

步骤101,获取电子设备被使用时所产生的待识别流量数据包;Step 101, acquiring traffic data packets to be identified generated when the electronic device is used;

本实施例的流量识别方法应用于电子设备,该电子设备可以为移动终端,用于根据用户在使用移动终端时所产生的流量数据包,识别出用户所使用的App,从而识别出所述用户的App使用行为,进而能够得到该用户的用户描述。The traffic identification method in this embodiment is applied to an electronic device, which may be a mobile terminal, and is used to identify the App used by the user according to the traffic data packets generated when the user uses the mobile terminal, thereby identifying the user The App usage behavior of the user can be obtained, and then the user description of the user can be obtained.

这里,可以在所述移动终端的流量信道中获取所述待识别流量数据包。所述移动终端可以通过实时监测流量信道中是否存在流量数据包,来监测用户在使用移动终端时是否产生了流量;当监测到流量信道中存在流量数据包时,获取所述待识别流量数据包。当然,移动终端也可以周期性地或非周期性地从流量信道中获取待识别流量数据包。Here, the to-be-identified traffic data packet may be acquired in the traffic channel of the mobile terminal. The mobile terminal can monitor whether there is a traffic data packet in the traffic channel in real time to monitor whether the user generates traffic when using the mobile terminal; when monitoring the presence of a traffic data packet in the traffic channel, obtain the to-be-identified traffic data packet. . Of course, the mobile terminal may also acquire the traffic data packets to be identified from the traffic channel periodically or aperiodically.

步骤102,对所述待识别流量数据包进行特征提取,获得第一内容特征;所述第一内容特征表征待识别流量数据包关联的应用程序App的内容特征;Step 102, perform feature extraction on the traffic data packet to be identified to obtain a first content feature; the first content feature represents the content feature of the application App associated with the traffic data packet to be identified;

具体地,对所述待识别流量数据包进行内容特征提取,获得所述第一内容特征,所述第一内容特征可以包括主页对应的网站标题、关键词、描述以及正文内容等特征,这些特征用于反映所述待识别流量数据包关联的应用程序App的内容特征。Specifically, content feature extraction is performed on the to-be-identified traffic data packet to obtain the first content feature, where the first content feature may include features such as website title, keywords, description, and text content corresponding to the homepage. These features It is used to reflect the content characteristics of the application program App associated with the traffic data packet to be identified.

可以使用固定位置内容提取方法对所述待识别流量数据包进行内容特征提取;也可灵活选择其他内容特征提取方法,如:DPI方法、最长最大公共子序列提取方法、以及深度学习中的特征挖掘方法等,这里将不对其进行限定。The fixed-position content extraction method can be used to extract the content features of the traffic data packets to be identified; other content feature extraction methods can also be flexibly selected, such as: DPI method, longest maximum common subsequence extraction method, and features in deep learning The mining method, etc., will not be limited here.

步骤103,将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;所述第二内容特征为从App的有效流量样本中提取的内容特征,所述有效流量样本包括App调用本地数据库所产生的流量数据包;Step 103: Match the first content feature with the second content feature of each App in the pre-built feature database, and determine at least one App associated with the traffic data packet to be identified; the second content feature is from the App The content feature extracted from the valid flow sample, the valid flow sample includes the flow data packet generated by the App calling the local database;

这里,所述第二内容特征具体表征App的本地库流量的内容特征。Here, the second content feature specifically represents the content feature of the local library traffic of the App.

这里,将所述第一内容特征与所述特征数据库中各App的第二内容特征进行逐一比对,获得所述待识别流量数据包关联的至少一个App;由于无法区分所述待识别流量数据包中的本地库流量和外部库流量,因此,所述至少一个App包括所述待识别流量数据包可能所属的App。Here, the first content feature and the second content feature of each App in the feature database are compared one by one to obtain at least one App associated with the to-be-identified traffic data packet; since the to-be-identified traffic data cannot be distinguished The traffic of the local library and the traffic of the external library in the package, therefore, the at least one App includes the App to which the traffic data package to be identified may belong.

比如,第一内容特征中包括“taobao”的特征,而淘宝App的第二内容特征中也包括“taobao”的特征,则所述第一内容特征与淘宝App的第二内容特征匹配成功,所述待识别流量数据包可能所属的App中包括淘宝App;又比如,第一内容特征中还包括“meituan”的特征,而美团App的第二内容特征中也同样包括该特征,则所述待识别流量数据包可能所属的App中还包括美团App。For example, the first content feature includes the feature of "taobao", and the second content feature of Taobao App also includes the feature of "taobao", then the first content feature and the second content feature of Taobao App are successfully matched, so The app to which the traffic data packet to be identified may belong includes the Taobao App; for another example, the first content feature also includes the feature of "meituan", and the second content feature of the Meituan App also includes this feature, then the The apps to which the traffic data packets to be identified may belong also include the Meituan app.

步骤104,根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App;所述第二行为特征表征App对第三方数据库的调用行为特征。Step 104, according to the second behavioral feature of the at least one App in the feature database, determine the App to which the traffic data packet to be identified belongs from the at least one App; the second behavioral feature characterizes the App to a third party. Database call behavior characteristics.

由于流量数据包中有可能同时包括本地库流量和外部库流量,并且,在无监督的场景下,仅从流量的内容无法判断其属性;在流量识别过程中,很大可能会存在流量属性错误判断的情况,这样会严重干扰到识别的精度。因此,在特征数据库中,需要提取出更加细粒度的特征(即第二行为特征),来解决特征识别过程中流量属性无法判断的缺陷,提高识别的精度。Because the traffic data packet may include both local library traffic and external library traffic, and in unsupervised scenarios, its attributes cannot be determined only from the content of the traffic; in the process of traffic identification, there may be traffic attribute errors. This will seriously interfere with the recognition accuracy. Therefore, in the feature database, it is necessary to extract more fine-grained features (ie, the second behavior feature) to solve the defect that the traffic attribute cannot be judged in the feature identification process and improve the identification accuracy.

所述待识别流量数据包关联的至少一个App只是流量识别过程中得到的初步识别结果;此时的识别结果中只考虑了内容特征,未对流量的属性进行区分,因此包含有大量的错误识别。The at least one App associated with the traffic data packet to be identified is only the preliminary identification result obtained in the flow identification process; only the content characteristics are considered in the identification result at this time, and the attributes of the traffic are not distinguished, so it contains a large number of false identifications .

这里,根据所述特征数据库中所述至少一个App的第二行为特征,对所述至少一个App中的App进行逐个筛选过滤,最终可以确定所述待识别流量数据包所属的App。具体地,从所述至少一个App的第二行为特征中获取目标行为特征,所述目标行为特征中的调用行为关联的App中包括所述待识别流量数据包关联的所有App;从所述特征数据库中获取所述目标行为特征对应的App;将所述目标行为特征对应的App确定为所述待识别流量数据包所属的App。Here, according to the second behavior feature of the at least one App in the feature database, the Apps in the at least one App are filtered one by one, and finally the App to which the traffic data packet to be identified belongs can be determined. Specifically, the target behavior feature is obtained from the second behavior feature of the at least one App, and the App associated with the calling behavior in the target behavior feature includes all the Apps associated with the traffic data packet to be identified; from the feature The App corresponding to the target behavior feature is obtained from the database; the App corresponding to the target behavior feature is determined as the App to which the traffic data packet to be identified belongs.

例如,所述至少一个App中包括淘宝App和支付宝App,而淘宝App的第二行为特征中包括淘宝App对支付宝App的调用行为,可知,该调用行为关联淘宝App和支付宝App,支付宝App的第二行为特征中并未包括支付宝App对淘宝App的调用行为,可知,该调用行为中只包括淘宝App。因此,该目标行为特征为淘宝App的第二行为特征,淘宝App即为所述待识别流量数据包所属的App。For example, the at least one App includes Taobao App and Alipay App, and the second behavior feature of Taobao App includes the calling behavior of Taobao App to Alipay App. It can be seen that the calling behavior is related to Taobao App and Alipay App, and the first behavior of Alipay App is related to Taobao App and Alipay App. The second behavior feature does not include the calling behavior of the Alipay App to the Taobao App. It can be seen that the calling behavior only includes the Taobao App. Therefore, the target behavior feature is the second behavior feature of the Taobao App, and the Taobao App is the App to which the traffic data packet to be identified belongs.

应当说明的是,当存在多个App的第二行为特征的调用行为关联的App中包括所述待识别流量数据包关联的所有App时,可以根据所述待识别流量数据包关联的App对应的流量多少,确定目标行为特征。具体地,可以将所述待识别流量数据包占比流量最多的App的第二行为特征确定为目标行为特征。例如,淘宝App的第二行为特征中包括淘宝App对支付宝App的调用行为,而支付宝App的第二行为特征中包括支付宝App对淘宝App的调用行为;但是,淘宝App在待识别流量数据包中所占的流量大于支付宝App在待识别流量数据包中所占的流量,因此,该目标行为特征可以为淘宝App的第二行为特征。It should be noted that, when there are multiple apps associated with the calling behavior of the second behavioral feature of the app, including all the apps associated with the traffic data packets to be identified, the corresponding App corresponding to the traffic data packets to be identified can be How much traffic to determine the target behavior characteristics. Specifically, the second behavior feature of the App whose traffic data packets to be identified account for the most traffic may be determined as the target behavior feature. For example, the second behavior feature of Taobao App includes the calling behavior of Taobao App to Alipay App, and the second behavior feature of Alipay App includes the calling behavior of Alipay App to Taobao App; however, Taobao App is in the traffic data package to be identified. The traffic occupied is greater than the traffic occupied by the Alipay App in the traffic data packets to be identified, therefore, the target behavior feature can be the second behavior feature of the Taobao App.

综上,本发明实施例中,先通过对待识别流量数据包进行特征提取,获得第一内容特征;再将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;最后,根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App。To sum up, in the embodiment of the present invention, the first content feature is obtained by extracting the features of the traffic data packets to be identified; then the first content feature is matched with the second content feature of each App in the pre-built feature database , determine at least one App associated with the traffic data packet to be identified; finally, according to the second behavioral feature of the at least one App in the feature database, determine from the at least one App to which the traffic data packet to be identified belongs App.

由于本发明实施例中预先构建的特征数据库中不仅包括各App的第二内容特征,而且还包括各App的第二行为特征,该第二内容特征为从App调用本地数据库所产生的流量数据包中提取的内容特征,该第二行为特征表征App对第三方数据库的调用行为特征;因此,基于本地库流量和外部库流量,对App的流量数据包提取了不同类型的特征,使得特征数据库中的特征具有更加优秀的粒度以及更强的抗干扰能力,从而能够提高特征识别的粒度,进而能够大大提高流量识别的精度。并且,本发明实施例的流量识别方法具有较强的鲁棒性,能够处理不同App在流量层面之间的差异性,能够通用化地识别出绝大多数App的流量。Because the pre-built feature database in the embodiment of the present invention not only includes the second content feature of each App, but also includes the second behavior feature of each App, the second content feature is the traffic data packet generated by calling the local database from the App The second behavior feature represents the calling behavior feature of the App to the third-party database; therefore, based on the local library traffic and external library traffic, different types of features are extracted from the App’s traffic data packets, so that the feature database The features have better granularity and stronger anti-interference ability, which can improve the granularity of feature identification and greatly improve the accuracy of traffic identification. In addition, the traffic identification method of the embodiment of the present invention has strong robustness, can deal with the differences between different apps in the traffic level, and can generally identify the traffic of most apps.

为了确保在预先构建的特征数据库中能够匹配到第一内容特征,所述特征数据库中需要包括大量App的第二内容特征和第二行为特征;并且为了识别的精度,还要保证所述特征数据库中特征的有效性。而如何获得大量App的有效流量样本和相应的流量特征提取方法是构建特征数据库的关键因素。其中,所述有效流量样本中包括App调用本地数据库所产生的流量数据包。In order to ensure that the first content feature can be matched in the pre-built feature database, the feature database needs to include a large number of second content features and second behavior features of apps; and for the accuracy of identification, the feature database needs to be guaranteed the validity of the features. How to obtain a large number of valid traffic samples of apps and the corresponding traffic feature extraction methods are the key factors in building a feature database. Wherein, the valid traffic samples include traffic data packets generated by the App calling the local database.

目前获得有效流量样本的方法主要有两种:At present, there are two main methods to obtain valid traffic samples:

第一种为:对流量样本进行标定;The first is: calibrating the flow sample;

具体地,首先,人工方式使用App,监听App运行过程所产生的流量;然后,依据专家经验,对得到的流量进行过滤,标定出认为有效的App流量作为所述App的流量样本。Specifically, first, the App is used manually to monitor the traffic generated during the running process of the App; then, according to expert experience, the obtained traffic is filtered, and the App traffic that is considered to be valid is demarcated as the traffic sample of the App.

第二种为:不对流量样本进行标定;The second is: do not calibrate the flow sample;

具体地,人工方式多次使用App,监听App运行过程所产生的流量,并将多次监听得到的流量拼接为一个流量样本。Specifically, the App is used manually many times, the traffic generated by the running process of the App is monitored, and the traffic obtained by the multiple monitoring is spliced into a traffic sample.

而目前的流量特征提取方法主要有三种:At present, there are mainly three kinds of traffic feature extraction methods:

第一种为:基于预定义或特殊端口的流量特征提取方法;The first is: traffic feature extraction method based on predefined or special ports;

具体地,根据通用的网络协议端口提取并识别网络流量,或根据预定义的特殊端口,提取并识别网络流量。Specifically, network traffic is extracted and identified according to a common network protocol port, or network traffic is extracted and identified according to a predefined special port.

第二种为:基于DPI的流量特征提取方法;The second is: DPI-based traffic feature extraction method;

具体地,根据确定的经验和规则,从流量的内容中提取符合指定条件的特征字、指纹或序列作为流量特征。Specifically, according to the determined experience and rules, feature words, fingerprints or sequences that meet the specified conditions are extracted from the content of the traffic as traffic features.

第三种为:基于深度学习模型的流量特征提取方法;The third is: traffic feature extraction method based on deep learning model;

具体地,将流量的内容按照一定的映射关系,转化为深度学习模型的标准输入;并根据深度学习模型,自动提取流量特征。Specifically, the content of the traffic is converted into the standard input of the deep learning model according to a certain mapping relationship; and the traffic features are automatically extracted according to the deep learning model.

然而,由于目前的流量样本获取方法,存在以下缺点:获取必须通过人工,极大地限制了特征数据库中的APP数量;需要对流量样本进行标定的方法,依赖于较强的专家经验,对标定人员的素质要求较高;不需对流量样本进行标定的方法,又依赖于本地库流量的数量远大于外部库流量,这在实际应用中,普适性无法保证,且多次对APP使用和监听以对流量样本进行拼接,也进一步地增加了时间消耗。因此,亟需寻找一种自动获取有效流量样本的方法,使得电子设备能够自动地获取大量App的有效流量样本。However, due to the current traffic sample acquisition method, there are the following shortcomings: the acquisition must be done manually, which greatly limits the number of APPs in the feature database; the method of calibrating traffic samples needs to rely on strong expert experience, and the calibration personnel The quality requirements of the APP are relatively high; the method does not need to calibrate the traffic samples, and it depends on the quantity of the local library traffic is much larger than the external library traffic. In practical applications, the universality cannot be guaranteed, and the APP is used and monitored many times. In order to splicing the flow samples, the time consumption is further increased. Therefore, there is an urgent need to find a method for automatically acquiring valid traffic samples, so that electronic devices can automatically acquire valid traffic samples of a large number of apps.

同时,目前的流量特征提取方法,存在以下缺点:At the same time, the current traffic feature extraction method has the following shortcomings:

基于预定义或特殊端口的流量特征提取方法,由于App所使用的网络协议大都是几种常用的类型,如:HTTP、SSL、HTTPS等,因此,通过非标准端口或新定义的端口进行流量识别,其识别粒度不够,无法定位产生样本流量的App。基于DPI的流量特征提取方法为监督或半监督方式,在确定特征字或指纹的过程中耗时耗力,且在存在噪声的条件下,特征的选取和识别效果均不理想。而基于深度学习模型的流量特征提取方法,输入需要人工进行筛选和标定,不适合大规模自动化地网络应用特征提取和识别,且模型训练过程中时间开销较大,缺乏理论上的有效性解释。The traffic feature extraction method based on predefined or special ports, because the network protocols used by the App are mostly of several common types, such as HTTP, SSL, HTTPS, etc. Therefore, traffic identification is performed through non-standard ports or newly defined ports. , the identification granularity is not enough to locate the App that generates the sample traffic. The DPI-based traffic feature extraction method is supervised or semi-supervised, which is time-consuming and labor-intensive in the process of determining feature words or fingerprints, and in the presence of noise, the selection and identification of features are not ideal. For the traffic feature extraction method based on deep learning model, the input needs to be manually screened and calibrated, which is not suitable for large-scale automated network application feature extraction and recognition, and the time overhead in the model training process is relatively large, lacking theoretical validity explanations.

因此,亟需寻找一种流量特征提取方法,使得提取的流量特征具有更加优秀的粒度和广度表现、以及更强的抗干扰能力。Therefore, it is urgent to find a traffic feature extraction method, so that the extracted traffic features have better granularity and breadth performance, and stronger anti-interference ability.

以下对如何具体构建特征数据库进行详细说明。The following describes in detail how to construct the feature database in detail.

具体地,图2为本发明实施例二中构建特征数据库的具体实现流程示意图,参照图2所示,本发明实施例中构建特征数据库具体包括以下步骤:Specifically, FIG. 2 is a schematic diagram of a specific implementation flow of building a feature database in Embodiment 2 of the present invention. Referring to FIG. 2 , building a feature database in the embodiment of the present invention specifically includes the following steps:

步骤201,针对每个App,获取相应的原生流量,所述原生流量为相应App在模拟环境运行过程中所产生的流量;Step 201, for each App, obtain the corresponding native traffic, where the native traffic is the traffic generated by the corresponding App during the running process of the simulated environment;

这里,所述模拟环境可以为纯净沙盒环境,通过网络爬虫技术和沙盒控制技术自动监听并获取运行于纯净沙盒环境中各App的原生流量。Here, the simulated environment may be a pure sandbox environment, and the native traffic of each App running in the pure sandbox environment is automatically monitored and acquired through the web crawler technology and the sandbox control technology.

该步骤主要用到两个模块:APK爬虫模块和拟人触发模块。This step mainly uses two modules: APK crawler module and anthropomorphic trigger module.

其中,APK爬虫模块用于基于每个App的描述信息,按照指定要求自动从网络站点中爬取相应App的安装包(即APK文件)。Among them, the APK crawler module is used to automatically crawl the installation package (ie, the APK file) of the corresponding App from the network site according to the specified requirements based on the description information of each App.

而拟人触发模块,用于自动化地监听并捕获App的原生流量,并保存至指定文件内。具体地,首先,在纯净的模拟器环境中,基于App安装包对所述App进行安装,并使用有效地拟人触发方式(如monkey命令或更具有拟人触发逻辑的组件深度优先触发的PUMA方法等),对安装的App进行触发,以使App自动运行;然后,在模拟器后台启动流量监听进程(如tcpdump或类似功能进程),对App运行过程中所产生的流量进行监听;最后,过滤掉模拟器的背景流量后,将监听的流量保存至指定文件,即可获取到App的原生流量。The anthropomorphic trigger module is used to automatically monitor and capture the native traffic of the App, and save it to the specified file. Specifically, first, in a pure simulator environment, install the App based on the App installation package, and use an effective anthropomorphic triggering method (such as the monkey command or the PUMA method of depth-priority triggering of components with anthropomorphic triggering logic, etc. ), trigger the installed App to make the App run automatically; then, start a traffic monitoring process (such as tcpdump or similar functional process) in the background of the simulator to monitor the traffic generated during the running of the App; finally, filter out After the background traffic of the simulator, save the monitored traffic to the specified file to obtain the native traffic of the app.

该步骤以自动拟人触发的方式代替人工触发,自动化地实现了流量样本的获取,从而能够获取大量App的原生流量。This step replaces manual triggering with automatic anthropomorphic triggering, which automatically realizes the acquisition of traffic samples, so that a large amount of native traffic of the App can be obtained.

步骤202,对所述原生流量进行过滤,获得所述相应App的有效流量样本;Step 202, filtering the native traffic to obtain valid traffic samples of the corresponding App;

这里,可以基于语义筛选方法,从所述App的原生流量中筛选出相应App的有效流量样本。其中,可以采用一级语义筛选方法,也可以采用二级语义筛选方法,还可以采用三级语义筛选方法,亦或是采用更多级语义筛选方法,获得所述有效流量样本;本实施例中,将以四级语义筛选方法为例进行详细说明。Here, based on the semantic screening method, valid traffic samples of the corresponding App can be screened from the native traffic of the App. Among them, the first-level semantic screening method, the second-level semantic screening method, the third-level semantic screening method, or the more-level semantic screening method can be used to obtain the effective traffic samples; in this embodiment , which will be described in detail by taking the four-level semantic screening method as an example.

具体地,对所述App的第N个特征信息进行语义分析,获得第一语义信息,所述N为正整数;对待分析流量数据包中与所述第N个特征信息对应的内容进行语义分析,获得所述待分析流量数据包的第二语义信息;所述待分析流量数据包为所述原生流量中通过N-1个特征信息进行过滤后未确定为有效流量数据包的流量数据包;所述有效流量数据包为所述有效流量样本中的流量数据包;计算所述第一语义信息与所述第二语义信息之间的关联度;将所述关联度大于或等于预设值的待分析流量数据包确定为有效流量样本中的流量数据包。Specifically, perform semantic analysis on the Nth feature information of the App to obtain first semantic information, where N is a positive integer; perform semantic analysis on the content corresponding to the Nth feature information in the traffic data packet to be analyzed , obtain the second semantic information of the traffic data packet to be analyzed; the traffic data packet to be analyzed is the traffic data packet that is not determined to be an effective traffic data packet after filtering through N-1 characteristic information in the original traffic; The effective flow data packet is the flow data packet in the effective flow sample; calculate the correlation degree between the first semantic information and the second semantic information; set the correlation degree greater than or equal to the preset value; The traffic data packet to be analyzed is determined as the traffic data packet in the valid traffic sample.

所述第N个特征信息可以为名称,也可以为包名,还可以为网页文本内容,亦或是关键字段,所述关键字段为流量的内容中用于标识App的字段。比如,在星锐娱乐App的原生流量中,特殊字段“stareal”存在且仅存在于星锐娱乐App的流量中,即特殊字段“stareal”与星锐娱乐App存在唯一性对应的特殊结构,该特殊字段“stareal”即为关键字段。The Nth feature information may be a name, a package name, a webpage text content, or a key field, where the key field is a field used to identify an App in the content of the traffic. For example, in the native traffic of Xingrui Entertainment App, the special field "stareal" exists and only exists in the traffic of Xingrui Entertainment App, that is, the special field "stareal" has a special structure corresponding to the uniqueness of Xingrui Entertainment App. The special field "stareal" is the key field.

下面以四级语义筛选方法举例进行详细说明。The following is a detailed description of the four-level semantic screening method as an example.

设第一个特征信息为名称,第二个特征信息为包名,第三个特征信息为网页文本内容,第四个特征信息为关键字段。当然,也可以有其他的执行顺序,这里将不对其进行限定。Let the first characteristic information be the name, the second characteristic information be the package name, the third characteristic information be the web page text content, and the fourth characteristic information be the key fields. Of course, there may also be other execution sequences, which will not be limited here.

首先,对所述App的名称进行语义分析,获得名称的语义信息;对第一待分析流量数据包中与名称对应的内容进行语义分析,获得所述第一待分析流量数据包的语义信息;所述第一待分析流量数据包为所述原生流量中所有的流量数据包;计算所述名称的语义信息与所述所述第一待分析流量数据包的语义信息之间的关联度;将所述关联度大于或等于预设值的第一待分析流量数据包确定为有效流量样本中的流量数据包;将第一待分析流量数据包中剩余的流量数据包确定为第二待分析流量数据包。First, perform semantic analysis on the name of the App to obtain the semantic information of the name; perform semantic analysis on the content corresponding to the name in the first traffic data packet to be analyzed to obtain the semantic information of the first traffic data packet to be analyzed; The first traffic data packet to be analyzed is all traffic data packets in the native traffic; calculating the degree of association between the semantic information of the name and the semantic information of the first traffic data packet to be analyzed; The first traffic data packet to be analyzed whose correlation degree is greater than or equal to the preset value is determined as the traffic data packet in the valid traffic sample; the remaining traffic data packet in the first traffic data packet to be analyzed is determined as the second traffic data packet to be analyzed data pack.

比如,在淘宝App的原生流量中,发现了含有“taobao”字段内容的流量数据包,则可以计算出该流量数据包中含有“taobao”字段内容的语义信息与淘宝App的名称的语义信息之间的相关度较大,当相关度满足一定条件时(如大于或等于某一阈值时),即可判断该流量数据包的属性为本地库流量,即为所述有效样本中的流量数据包。For example, in the native traffic of Taobao App, if a traffic data packet containing the content of the "taobao" field is found, the difference between the semantic information of the content of the "taobao" field in the traffic data packet and the semantic information of the name of Taobao App can be calculated. When the correlation degree satisfies a certain condition (such as greater than or equal to a certain threshold), it can be judged that the attribute of the traffic data packet is the local library traffic, that is, the traffic data packet in the valid sample .

接着,对所述App的包名进行语义分析,获得包名的语义信息;对第二待分析流量数据包中与包名对应的内容进行语义分析,获得所述第二待分析流量数据包的语义信息;所述第二待分析流量数据包为原生流量中经过第一级筛选后剩余的流量数据包;计算所述包名的语义信息与所述第二待分析流量数据包的语义信息之间的关联度;将所述关联度大于或等于预设值的第二待分析流量数据包确定为有效流量样本中的流量数据包;将第二待分析流量数据包剩余的流量数据包确定为第三待分析流量数据包。Next, perform semantic analysis on the package name of the App to obtain semantic information of the package name; perform semantic analysis on the content corresponding to the package name in the second traffic data packet to be analyzed, to obtain the second traffic data packet to be analyzed. Semantic information; the second traffic data packet to be analyzed is the traffic data packet remaining after the first-level screening in the original traffic; calculate the difference between the semantic information of the packet name and the semantic information of the second traffic data packet to be analyzed. The correlation degree between the two; the second flow data packet to be analyzed whose correlation degree is greater than or equal to the preset value is determined as the flow data packet in the effective flow sample; the remaining flow data packets of the second flow data packet to be analyzed are determined as The third traffic data packet to be analyzed.

比如,在高德地图App的原生流量中,发现了含有“autoavi”字段内容的流量数据包,这与高德地图App的包名(com.autoavi.minimap)存在语义联系,则可以计算出该流量数据包中含有“autoavi”字段内容的语义信息与高德地图App的包名的语义信息之间的关联程度较大,当相关度满足一定条件时,即可判断该流量数据包的属性为本地库流量,即为有效流量样本中的流量数据包。For example, in the native traffic of the AutoNavi Map App, a traffic data packet containing the content of the "autoavi" field is found, which is semantically related to the package name (com.autoavi.minimap) of the AutoNavi Map App. The degree of correlation between the semantic information of the "autoavi" field content in the traffic data packet and the semantic information of the package name of the AutoNavi Map App is relatively large. When the correlation degree satisfies certain conditions, it can be judged that the attribute of the traffic data packet is The local library traffic is the traffic data packets in the valid traffic samples.

接着,对所述App的网页文本内容进行语义分析,获得网页文本内容的语义信息;对第三待分析流量数据包中与网页文本内容对应的内容进行语义分析,获得所述第三待分析流量数据包的语义信息;所述第三待分析流量数据包为原生流量中经过第二级筛选后剩余的流量数据包;计算所述网页文本内容的语义信息与所述第三待分析流量数据包的语义信息之间的关联度;将所述关联度大于或等于预设值的第三待分析流量数据包确定为有效流量样本中的流量数据包;将第三待分析流量数据包剩余的流量数据包确定为第四待分析流量数据包。Next, perform semantic analysis on the web page text content of the App to obtain semantic information of the web page text content; perform semantic analysis on the content corresponding to the web page text content in the third to-be-analyzed traffic data packet to obtain the third to-be-analyzed traffic Semantic information of the data packet; the third traffic data packet to be analyzed is the remaining traffic data packet in the original traffic after the second-level screening; calculating the semantic information of the webpage text content and the third traffic data packet to be analyzed The degree of association between the semantic information of The data packet is determined to be the fourth traffic data packet to be analyzed.

比如,在车主无忧App的原生流量中,流量内容中不蕴含任何与其名称(车主无忧),或包名(com.starbaba.starbaba)存在语义联系的内容。但在流量内容中暗含一些网页如URL---www.xmiles.com,对应的网页文本内容含有车主无忧App的关键字或其他相关的语义性质内容,则仍可以计算出该流量数据包中含有“URL---www.xmiles.com”字段内容的语义信息与车主无忧的网页文本内容的语义信息之间的相关度较大,当相关度满足一定条件时,判断该流量数据包的属性为本地库流量,即为有效流量样本中的流量数据包。For example, in the native traffic of Car Owner Worry-free App, the traffic content does not contain any content that is semantically related to its name (Car Owner Worry-free) or package name (com.starbaba.starbaba). However, if some web pages such as URL---www.xmiles.com are implied in the traffic content, and the corresponding web page text content contains the keywords of Car Owner Worry-free App or other related semantic content, it is still possible to calculate the content of the traffic data packet. The correlation between the semantic information containing the content of the "URL---www.xmiles.com" field and the semantic information of the text content of the car owner's worry-free web page is relatively large. The attribute is the local library traffic, that is, the traffic data packets in the valid traffic sample.

最后,对所述App的关键字段进行语义分析,获得关键字段的语义信息;对第四待分析流量数据包中与关键字段对应的内容进行语义分析,获得所述第四待分析流量数据包的语义信息;所述第四待分析流量数据包为原生流量中经过第三级筛选后剩余的流量数据包;计算所述关键字段的语义信息与所述第四待分析流量数据包的语义信息之间的关联度;将所述关联度大于或等于预设值的第四待分析流量数据包确定为有效流量样本中的流量数据包。Finally, perform semantic analysis on the key fields of the App to obtain semantic information of the key fields; perform semantic analysis on the content corresponding to the key fields in the fourth to-be-analyzed traffic data packet to obtain the fourth to-be-analyzed traffic Semantic information of the data packet; the fourth traffic data packet to be analyzed is the remaining traffic data packet in the original traffic after the third-level screening; calculating the semantic information of the key field and the fourth traffic data packet to be analyzed The correlation degree between the semantic information; the fourth traffic data packet to be analyzed whose correlation degree is greater than or equal to the preset value is determined as the traffic data packet in the valid traffic sample.

比如,在星锐娱乐App的原生流量中,特殊字段“stareal”存在且仅存在于星锐娱乐App的流量中,即特殊字段“stareal”与星锐娱乐App存在唯一性对应的特殊结构,则可以判定该流量数据包为星锐娱乐App的本地库流量,即为有效流量样本中的流量数据包。For example, in the native traffic of Xingrui Entertainment App, the special field "stareal" exists and only exists in the traffic of Xingrui Entertainment App, that is, the special field "stareal" has a special structure corresponding to the uniqueness of Xingrui Entertainment App, then It can be determined that the traffic data packet is the local library traffic of Xingrui Entertainment App, that is, the traffic data packet in the valid traffic sample.

依照上述四级语义筛选,即可实现对所述原生流量的过滤,有效地对每一个流量数据包的属性进行判定,最终得到所述App的有效流量样本,实现App的有效流量样本的全自动获取。According to the above-mentioned four-level semantic screening, the filtering of the native traffic can be realized, the attributes of each traffic data packet can be effectively determined, and finally the valid traffic samples of the App can be obtained. Obtain.

步骤203,对所述有效流量样本进行内容特征提取,获得所述第二内容特征;Step 203, performing content feature extraction on the valid traffic sample to obtain the second content feature;

这里,可以使用固定位置内容提取方法对所述有效流量样本进行内容特征提取;也可灵活选择其他内容特征提取方法,如:DPI方法、最长最大公共子序列提取方法、以及深度学习中的特征挖掘方法等,这里将不对其进行限定。Here, content feature extraction can be performed on the valid traffic samples by using a content extraction method at a fixed location; other content feature extraction methods can also be flexibly selected, such as: DPI method, longest maximum common subsequence extraction method, and features in deep learning The mining method, etc., will not be limited here.

步骤204,根据预先获取的App的第二内容特征和所述原生流量,获得所述相应App的第二行为特征;Step 204, according to the second content feature of the pre-acquired App and the native traffic, obtain the second behavior feature of the corresponding App;

图3为图2所示实现流程中步骤204的具体实现流程示意图,参照图3所示,步骤204具体包括以下步骤:FIG. 3 is a schematic diagram of a specific implementation process of step 204 in the implementation process shown in FIG. 2. Referring to FIG. 3, step 204 specifically includes the following steps:

步骤2041,对所述原生流量进行内容特征提取,获得所述原生流量的内容特征;Step 2041, performing content feature extraction on the native traffic to obtain content features of the native traffic;

可以使用固定位置内容提取方法对所述待识别流量数据包进行内容特征提取;也可灵活选择其他内容特征提取方法,如:DPI方法、最长最大公共子序列提取方法、以及深度学习中的特征挖掘方法等,这里将不对其进行限定。The fixed-position content extraction method can be used to extract the content features of the traffic data packets to be identified; other content feature extraction methods can also be flexibly selected, such as: DPI method, longest maximum common subsequence extraction method, and features in deep learning The mining method, etc., will not be limited here.

步骤2042,将所述原生流量的内容特征和预先获取的App的第二内容特征进行匹配,确定所述原生流量关联的App;Step 2042, matching the content characteristics of the native traffic with the second content characteristics of the pre-acquired App to determine the App associated with the native traffic;

步骤2043,根据所述App的描述信息和所述原生流量关联的App,获得所述第二行为特征。Step 2043: Obtain the second behavior feature according to the description information of the App and the App associated with the native traffic.

这里,所述第二行为特征包括App对第三方数据库的调用行为、以及调用的时间次序和频率等结构性信息。Here, the second behavior feature includes the calling behavior of the App to the third-party database, and structural information such as the time sequence and frequency of the calls.

该步骤通过多级异步的方式对流量特征进行提取,首先对提取的有效流量样本进行内容特征提取,确定App的有效流量样本的第二内容特征;然后,根据预先提取的各App的第二内容特征和所述App的原生流量的内容特征,确定所述原生流量关联的App;最后,基于App的描述信息和所述原生流量关联的App,确定所述第二行为特征。从而得到了App多层次的流量特征,进而能够获得更加精准的流量识别效果。In this step, the traffic features are extracted in a multi-level asynchronous manner. First, content feature extraction is performed on the extracted valid traffic samples to determine the second content features of the valid traffic samples of the App; then, according to the pre-extracted second content of each App The feature and the content feature of the native traffic of the App are used to determine the App associated with the native traffic; finally, the second behavior feature is determined based on the description information of the App and the App associated with the native traffic. As a result, the multi-level traffic characteristics of the app are obtained, and a more accurate traffic identification effect can be obtained.

步骤205,利用各App的第二内容特征和第二行为特征,构建所述特征数据库。In step 205, the feature database is constructed by using the second content feature and the second behavior feature of each App.

这里,将所述各App的描述信息、相应的第二内容特征和第二行为特征进行关联保存,即可构建所述特征数据库。Here, the feature database can be constructed by correlating and saving the description information of each App, the corresponding second content feature and the second behavior feature.

为实现本发明实施例的方法,本发明实施例还提供了一种流量识别装置,用于实现上述流量识别方法的具体细节,达到相同的效果。In order to implement the method of the embodiment of the present invention, the embodiment of the present invention further provides a traffic identification device, which is used to implement the specific details of the above-mentioned traffic identification method and achieve the same effect.

图4为本发明实施例流量识别装置组成结构示意图,参照图4所示,本实施例中的流量识别装置包括:第一获取模块31、第一特征提取模块32、第一确定模块33和第二确定模块34;其中,FIG. 4 is a schematic structural diagram of a traffic identification device according to an embodiment of the present invention. Referring to FIG. 4 , the traffic identification device in this embodiment includes: a first acquisition module 31 , a first feature extraction module 32 , a first determination module 33 and a first Two determination modules 34; wherein,

所述第一获取模块31,用于获取电子设备被使用时所产生的待识别流量数据包;The first acquisition module 31 is used to acquire the traffic data packets to be identified generated when the electronic device is used;

所述第一特征提取模块32,用于对所述待识别流量数据包进行特征提取,获得第一内容特征;所述第一内容特征表征待识别流量数据包关联的应用程序App的内容特征;The first feature extraction module 32 is configured to perform feature extraction on the traffic data packet to be identified to obtain a first content feature; the first content feature represents the content feature of the application App associated with the traffic data packet to be identified;

所述第一确定模块33,用于将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;所述第二内容特征为从App的有效流量样本中提取的内容特征,所述有效流量样本包括App调用本地数据库所产生的流量数据包;The first determining module 33 is configured to match the first content feature with the second content feature of each App in the pre-built feature database, and determine at least one App associated with the traffic data packet to be identified; the The second content feature is the content feature extracted from the valid flow sample of the App, and the valid flow sample includes the flow data packet generated by the App calling the local database;

所述第二确定模块34,用于根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App;所述第二行为特征表征App对第三方数据库的调用行为特征。The second determination module 34 is configured to determine the App to which the traffic data packet to be identified belongs from the at least one App according to the second behavioral feature of the at least one App in the feature database; the second The behavior characteristics represent the behavior characteristics of the app calling the third-party database.

可选地,本实施例中的流量识别装置还包括:第二获取模块35、过滤模块36、第二特征提取模块37、第三获取模块38和构建模块39;其中,Optionally, the flow identification device in this embodiment further includes: a second acquisition module 35, a filtering module 36, a second feature extraction module 37, a third acquisition module 38, and a construction module 39; wherein,

所述第二获取模块35,用于针对每个App,获取相应的原生流量,所述原生流量为相应App在模拟环境运行过程中所产生的流量;The second obtaining module 35 is configured to obtain corresponding native traffic for each App, where the native traffic is the traffic generated by the corresponding App during the running process of the simulated environment;

所述过滤模块36,用于对所述原生流量进行过滤,获得所述相应App的有效流量样本;The filtering module 36 is configured to filter the native traffic to obtain valid traffic samples of the corresponding App;

所述第二特征提取模块37,用于对所述有效流量样本进行内容特征提取,获得所述相应App的第二内容特征;The second feature extraction module 37 is configured to perform content feature extraction on the effective traffic sample to obtain the second content feature of the corresponding App;

所述第三获取模块38,用于根据预先获取的App的第二内容特征和所述原生流量,获得所述相应App的第二行为特征;The third obtaining module 38 is configured to obtain the second behavioral feature of the corresponding App according to the second content feature of the App obtained in advance and the native traffic;

所述构建模块39,用于利用各App的第二内容特征和第二行为特征,构建所述特征数据库。The construction module 39 is configured to construct the characteristic database by using the second content characteristic and the second behavior characteristic of each App.

可选地,所述过滤模块36,具体用于对所述App的第N个特征信息进行语义分析,获得第一语义信息,所述N为正整数;对待分析流量数据包中与所述第N个特征信息对应的内容进行语义分析,获得所述待分析流量数据包的第二语义信息;所述待分析流量数据包为所述原生流量中通过N-1个特征信息进行过滤后未确定为有效流量数据包的流量数据包;所述有效流量数据包为所述有效流量样本中的流量数据包;计算所述第一语义信息与所述第二语义信息之间的关联度;将所述关联度大于或等于预设值的待分析流量数据包确定为有效流量样本中的流量数据包。Optionally, the filtering module 36 is specifically configured to perform semantic analysis on the Nth feature information of the App to obtain first semantic information, where N is a positive integer; Perform semantic analysis on the content corresponding to the N pieces of feature information to obtain the second semantic information of the traffic data packet to be analyzed; the traffic data packet to be analyzed is undetermined after filtering through N-1 feature information in the native traffic is a flow data packet of an effective flow data packet; the effective flow data packet is a flow data packet in the effective flow sample; calculate the correlation degree between the first semantic information and the second semantic information; The traffic data packets to be analyzed whose correlation degree is greater than or equal to the preset value are determined as the traffic data packets in the valid traffic samples.

可选地,图5为图4所述装置中第三获取模块的具体组成结构示意图,参照图5所示,所述第三获取模块38包括:内容特征提取单元381、匹配单元382和获取单元383;其中,Optionally, FIG. 5 is a schematic structural diagram of the specific composition of the third acquisition module in the device shown in FIG. 4 . Referring to FIG. 5 , the third acquisition module 38 includes: a content feature extraction unit 381 , a matching unit 382 and an acquisition unit 383; of which,

所述内容特征提取单元381,用于对所述原生流量进行内容特征提取,获得所述原生流量的内容特征;The content feature extraction unit 381 is configured to perform content feature extraction on the native traffic to obtain content features of the native traffic;

所述匹配单元382,用于将所述原生流量的内容特征和预先获取的App的第二内容特征进行匹配,确定所述原生流量关联的App;The matching unit 382 is configured to match the content characteristics of the native traffic with the second content characteristics of the pre-acquired App to determine the App associated with the native traffic;

所述获取单元383,用于根据所述App的描述信息和所述原生流量关联的App,获得所述第二行为特征。The obtaining unit 383 is configured to obtain the second behavior feature according to the description information of the App and the App associated with the native traffic.

可选地,所述第二确定模块34,具体用于从所述至少一个App的第二行为特征中获取目标行为特征,所述目标行为特征中的调用行为关联的App中包括所述待识别流量数据包关联的所有App;从所述特征数据库中获取所述目标行为特征对应的App;将所述目标行为特征对应的App确定为所述待识别流量数据包所属的App。Optionally, the second determination module 34 is specifically configured to obtain a target behavior feature from the second behavior feature of the at least one App, and the App associated with the calling behavior in the target behavior feature includes the to-be-identified App. All Apps associated with the traffic data packets; obtain the App corresponding to the target behavior feature from the feature database; determine the App corresponding to the target behavior feature as the App to which the to-be-identified traffic data packet belongs.

在实际应用中,所述第一获取模块31、第一特征提取模块32、第一确定模块33、第二确定模块34、第二获取模块35、过滤模块36、第二特征提取模块37、第三获取模块38、构建模块39、以及内容特征提取单元381、匹配单元382和获取单元383均可由位于流量识别装置中的处理器实现。In practical applications, the first acquisition module 31, the first feature extraction module 32, the first determination module 33, the second determination module 34, the second acquisition module 35, the filtering module 36, the second feature extraction module 37, the first The three acquisition modules 38, the construction module 39, the content feature extraction unit 381, the matching unit 382 and the acquisition unit 383 can all be implemented by a processor located in the traffic identification device.

上述实施例提供的流量识别装置在进行流量识别时,仅以上述各程序模块的划分进行举例说明,实际应用中,可以根据需要而将上述处理分配由不同的程序模块完成,即将装置的内部结构划分成不同的程序模块,以完成以上描述的全部或者部分处理。另外,上述实施例提供的流量识别装置与流量识别方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。When the traffic identification device provided by the above embodiment performs traffic identification, only the division of the above program modules is used for illustration. Divided into different program modules to perform all or part of the processing described above. In addition, the traffic identification device and the traffic identification method embodiments provided by the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.

图6为本发明实施例流量装置硬件组成结构示意图,参照图6所示,本实施例中的流量识别装置包括:处理器41、用于存储能够在处理器41上运行的计算机程序的存储器42;其中,6 is a schematic diagram of the hardware structure of a flow device according to an embodiment of the present invention. Referring to FIG. 6 , the flow identification device in this embodiment includes: a processor 41 and a memory 42 for storing a computer program that can be run on the processor 41 ;in,

所述处理器41,用于运行所述计算机程序时,执行:The processor 41, when running the computer program, executes:

获取电子设备被使用时所产生的待识别流量数据包;Obtain the traffic data packets to be identified generated when the electronic device is used;

对所述待识别流量数据包进行特征提取,获得第一内容特征;所述第一内容特征表征待识别流量数据包关联的应用程序App的内容特征;Perform feature extraction on the traffic data packet to be identified to obtain a first content feature; the first content feature represents the content feature of an application program App associated with the traffic data packet to be identified;

将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;所述第二内容特征为从App的有效流量样本中提取的内容特征,所述有效流量样本包括App调用本地数据库所产生的流量数据包;Match the first content feature with the second content feature of each App in the pre-built feature database, and determine at least one App associated with the traffic data packet to be identified; the second content feature is the valid traffic from the App The content feature extracted from the sample, the effective flow sample includes the flow data packet generated by the App calling the local database;

根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App;所述第二行为特征表征App对第三方数据库的调用行为特征。According to the second behavior feature of the at least one App in the feature database, the App to which the traffic data packet to be identified belongs is determined from the at least one App; the second behavior feature represents the call of the App to the third-party database Behavioral characteristics.

可选地,所述处理器41,用于运行所述计算机程序时,执行:Optionally, the processor 41, when running the computer program, executes:

针对每个App,获取相应的原生流量,所述原生流量为相应App在模拟环境运行过程中所产生的流量;For each App, obtain the corresponding native traffic, where the native traffic is the traffic generated by the corresponding App during the running process of the simulated environment;

对所述原生流量进行过滤,获得所述相应App的有效流量样本;Filtering the native traffic to obtain valid traffic samples of the corresponding App;

对所述有效流量样本进行内容特征提取,获得所述相应App的第二内容特征;Perform content feature extraction on the valid traffic sample to obtain the second content feature of the corresponding App;

根据预先获取的App的第二内容特征和所述原生流量,获得所述相应App的第二行为特征;Obtain the second behavioral feature of the corresponding App according to the pre-acquired second content feature of the App and the native traffic;

利用各App的第二内容特征和第二行为特征,构建所述特征数据库。The feature database is constructed by using the second content feature and the second behavior feature of each App.

可选地,所述处理器41,用于运行所述计算机程序时,执行:Optionally, the processor 41, when running the computer program, executes:

对所述App的第N个特征信息进行语义分析,获得第一语义信息,所述N为正整数;Semantic analysis is performed on the Nth feature information of the App to obtain the first semantic information, and the N is a positive integer;

对待分析流量数据包中与所述第N个特征信息对应的内容进行语义分析,获得所述待分析流量数据包的第二语义信息;所述待分析流量数据包为所述原生流量中通过N-1个特征信息进行过滤后未确定为有效流量数据包的流量数据包;所述有效流量数据包为所述有效流量样本中的流量数据包;Perform semantic analysis on the content corresponding to the Nth feature information in the traffic data packet to be analyzed, and obtain the second semantic information of the traffic data packet to be analyzed; the traffic data packet to be analyzed is the original traffic through N -1 traffic data packet that is not determined to be an effective traffic data packet after filtering the characteristic information; the effective traffic data packet is the traffic data packet in the effective traffic sample;

计算所述第一语义信息与所述第二语义信息之间的关联度;calculating the degree of association between the first semantic information and the second semantic information;

将所述关联度大于或等于预设值的待分析流量数据包确定为有效流量样本中的流量数据包。The traffic data packets to be analyzed whose correlation degree is greater than or equal to the preset value are determined as the traffic data packets in the valid traffic samples.

可选地,所述处理器41,用于运行所述计算机程序时,执行:Optionally, the processor 41, when running the computer program, executes:

对所述原生流量进行内容特征提取,获得所述原生流量的内容特征;performing content feature extraction on the native traffic to obtain content features of the native traffic;

将所述原生流量的内容特征和预先获取的App的第二内容特征进行匹配,确定所述原生流量关联的App;Matching the content feature of the native traffic with the second content feature of the pre-acquired App to determine the App associated with the native traffic;

根据所述App的描述信息和所述原生流量关联的App,获得所述第二行为特征。The second behavior characteristic is obtained according to the description information of the App and the App associated with the native traffic.

可选地,所述处理器41,用于运行所述计算机程序时,执行:Optionally, the processor 41, when running the computer program, executes:

从所述至少一个App的第二行为特征中获取目标行为特征,所述目标行为特征中的调用行为关联的App中包括所述待识别流量数据包关联的所有App;Obtain the target behavior feature from the second behavior feature of the at least one App, and the App associated with the calling behavior in the target behavior feature includes all the Apps associated with the traffic data packet to be identified;

从所述特征数据库中获取所述目标行为特征对应的App;Obtain the App corresponding to the target behavior feature from the feature database;

将所述目标行为特征对应的App确定为所述待识别流量数据包所属的App。The App corresponding to the target behavior feature is determined as the App to which the to-be-identified traffic data packet belongs.

当然,实际应用时,如图6所示,各个组件通过总线系统43耦合在一起。可理解,总线系统43用于实现这些组件之间的连接通信。总线系统43除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图6中将各种总线都标为总线系统43。Of course, in practical application, as shown in FIG. 6 , various components are coupled together through the bus system 43 . It is understood that the bus system 43 is used to implement the connection communication between these components. In addition to the data bus, the bus system 43 also includes a power bus, a control bus and a status signal bus. However, for the sake of clarity, the various buses are designated as bus system 43 in FIG. 6 .

本发明实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有可执行程序,所述可执行程序被处理器41执行时,以实现以下步骤:An embodiment of the present invention further provides a computer-readable storage medium, where an executable program is stored in the computer-readable storage medium, and when the executable program is executed by the processor 41, the following steps are implemented:

获取电子设备被使用时所产生的待识别流量数据包;Obtain the traffic data packets to be identified generated when the electronic device is used;

对所述待识别流量数据包进行特征提取,获得第一内容特征;所述第一内容特征表征待识别流量数据包关联的应用程序App的内容特征;Perform feature extraction on the traffic data packet to be identified to obtain a first content feature; the first content feature represents the content feature of an application program App associated with the traffic data packet to be identified;

将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;所述第二内容特征为从App的有效流量样本中提取的内容特征,所述有效流量样本包括App调用本地数据库所产生的流量数据包;Match the first content feature with the second content feature of each App in the pre-built feature database, and determine at least one App associated with the traffic data packet to be identified; the second content feature is the valid traffic from the App The content feature extracted from the sample, the effective flow sample includes the flow data packet generated by the App calling the local database;

根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App;所述第二行为特征表征App对第三方数据库的调用行为特征。According to the second behavior feature of the at least one App in the feature database, the App to which the traffic data packet to be identified belongs is determined from the at least one App; the second behavior feature represents the call of the App to the third-party database Behavioral characteristics.

可选地,所述可执行程序被处理器41执行时,以实现以下步骤:Optionally, when the executable program is executed by the processor 41, the following steps are implemented:

针对每个App,获取相应的原生流量,所述原生流量为相应App在模拟环境运行过程中所产生的流量;For each App, obtain the corresponding native traffic, where the native traffic is the traffic generated by the corresponding App during the running process of the simulated environment;

对所述原生流量进行过滤,获得所述相应App的有效流量样本;Filtering the native traffic to obtain valid traffic samples of the corresponding App;

对所述有效流量样本进行内容特征提取,获得所述相应App的第二内容特征;Perform content feature extraction on the valid traffic sample to obtain the second content feature of the corresponding App;

根据预先获取的App的第二内容特征和所述原生流量,获得所述相应App的第二行为特征;Obtain the second behavioral feature of the corresponding App according to the pre-acquired second content feature of the App and the native traffic;

利用各App的第二内容特征和第二行为特征,构建所述特征数据库。The feature database is constructed by using the second content feature and the second behavior feature of each App.

可选地,所述可执行程序被处理器41执行时,以具体实现对所述原生流量进行过滤,获得所述相应App的有效流量样本的步骤:Optionally, when the executable program is executed by the processor 41, it specifically implements the steps of filtering the native traffic and obtaining valid traffic samples of the corresponding App:

对所述App的第N个特征信息进行语义分析,获得第一语义信息,所述N为正整数;Semantic analysis is performed on the Nth feature information of the App to obtain the first semantic information, and the N is a positive integer;

对待分析流量数据包中与所述第N个特征信息对应的内容进行语义分析,获得所述待分析流量数据包的第二语义信息;所述待分析流量数据包为所述原生流量中通过N-1个特征信息进行过滤后未确定为有效流量数据包的流量数据包;所述有效流量数据包为所述有效流量样本中的流量数据包;Perform semantic analysis on the content corresponding to the Nth feature information in the traffic data packet to be analyzed, and obtain the second semantic information of the traffic data packet to be analyzed; the traffic data packet to be analyzed is the original traffic through N -1 traffic data packet that is not determined to be an effective traffic data packet after filtering the characteristic information; the effective traffic data packet is the traffic data packet in the effective traffic sample;

计算所述第一语义信息与所述第二语义信息之间的关联度;calculating the degree of association between the first semantic information and the second semantic information;

将所述关联度大于或等于预设值的待分析流量数据包确定为有效流量样本中的流量数据包。The traffic data packets to be analyzed whose correlation degree is greater than or equal to the preset value are determined as the traffic data packets in the valid traffic samples.

可选地,所述可执行程序被处理器41执行时,以具体实现根据预先获取的App的第二内容特征和所述原生流量,获得所述相应App的第二行为特征的步骤:Optionally, when the executable program is executed by the processor 41, to specifically implement the step of obtaining the second behavioral feature of the corresponding App according to the second content feature of the App and the native traffic acquired in advance:

对所述原生流量进行内容特征提取,获得所述原生流量的内容特征;performing content feature extraction on the native traffic to obtain content features of the native traffic;

将所述原生流量的内容特征和预先获取的App的第二内容特征进行匹配,确定所述原生流量关联的App;Matching the content feature of the native traffic with the second content feature of the pre-acquired App to determine the App associated with the native traffic;

根据所述App的描述信息和所述原生流量关联的App,获得所述第二行为特征。The second behavior characteristic is obtained according to the description information of the App and the App associated with the native traffic.

可选地,所述可执行程序被处理器41执行时,以具体实现根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App的步骤:Optionally, when the executable program is executed by the processor 41, to specifically implement the determination of the traffic data to be identified from the at least one App according to the second behavioral feature of the at least one App in the feature database. Steps for the App the package belongs to:

从所述至少一个App的第二行为特征中获取目标行为特征,所述目标行为特征中的调用行为关联的App中包括所述待识别流量数据包关联的所有App;Obtain the target behavior feature from the second behavior feature of the at least one App, and the App associated with the calling behavior in the target behavior feature includes all the Apps associated with the traffic data packet to be identified;

从所述特征数据库中获取所述目标行为特征对应的App;Obtain the App corresponding to the target behavior feature from the feature database;

将所述目标行为特征对应的App确定为所述待识别流量数据包所属的App。The App corresponding to the target behavior feature is determined as the App to which the to-be-identified traffic data packet belongs.

以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any modifications, equivalent replacements and improvements made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims (12)

1.一种流量识别方法,其特征在于,所述方法包括:1. a traffic identification method, is characterized in that, described method comprises: 获取电子设备被使用时所产生的待识别流量数据包;Obtain the traffic data packets to be identified generated when the electronic device is used; 对所述待识别流量数据包进行特征提取,获得第一内容特征;所述第一内容特征表征待识别流量数据包关联的应用程序App的内容特征;Perform feature extraction on the traffic data packet to be identified to obtain a first content feature; the first content feature represents the content feature of an application program App associated with the traffic data packet to be identified; 将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;所述第二内容特征为从App的有效流量样本中提取的内容特征,所述有效流量样本包括App调用本地数据库所产生的流量数据包;Match the first content feature with the second content feature of each App in the pre-built feature database, and determine at least one App associated with the traffic data packet to be identified; the second content feature is the valid traffic from the App The content feature extracted from the sample, the effective flow sample includes the flow data packet generated by the App calling the local database; 根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App;所述第二行为特征表征App对第三方数据库的调用行为特征。According to the second behavior feature of the at least one App in the feature database, the App to which the traffic data packet to be identified belongs is determined from the at least one App; the second behavior feature represents the call of the App to the third-party database Behavioral characteristics. 2.根据权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, wherein the method further comprises: 针对每个App,获取相应的原生流量,所述原生流量为相应App在模拟环境运行过程中所产生的流量;For each App, obtain the corresponding native traffic, where the native traffic is the traffic generated by the corresponding App during the running process of the simulated environment; 对所述原生流量进行过滤,获得所述相应App的有效流量样本;Filtering the native traffic to obtain valid traffic samples of the corresponding App; 对所述有效流量样本进行内容特征提取,获得所述相应App的第二内容特征;Perform content feature extraction on the valid traffic sample to obtain the second content feature of the corresponding App; 根据预先获取的App的第二内容特征和所述原生流量,获得所述相应App的第二行为特征;Obtain the second behavioral feature of the corresponding App according to the pre-acquired second content feature of the App and the native traffic; 利用各App的第二内容特征和第二行为特征,构建所述特征数据库。The feature database is constructed by using the second content feature and the second behavior feature of each App. 3.根据权利要求2所述的方法,其特征在于,所述对所述原生流量进行过滤,获得所述相应App的有效流量样本,包括:3. The method according to claim 2, wherein the filtering of the native traffic to obtain an effective traffic sample of the corresponding App comprises: 对所述App的第N个特征信息进行语义分析,获得第一语义信息,所述N为正整数;Semantic analysis is performed on the Nth feature information of the App to obtain the first semantic information, and the N is a positive integer; 对待分析流量数据包中与所述第N个特征信息对应的内容进行语义分析,获得所述待分析流量数据包的第二语义信息;所述待分析流量数据包为所述原生流量中通过N-1个特征信息进行过滤后未确定为有效流量数据包的流量数据包;所述有效流量数据包为所述有效流量样本中的流量数据包;Perform semantic analysis on the content corresponding to the Nth feature information in the traffic data packet to be analyzed, and obtain the second semantic information of the traffic data packet to be analyzed; the traffic data packet to be analyzed is the original traffic through N -1 traffic data packet that is not determined to be an effective traffic data packet after filtering the characteristic information; the effective traffic data packet is the traffic data packet in the effective traffic sample; 计算所述第一语义信息与所述第二语义信息之间的关联度;calculating the degree of association between the first semantic information and the second semantic information; 将所述关联度大于或等于预设值的待分析流量数据包确定为有效流量样本中的流量数据包。The traffic data packets to be analyzed whose correlation degree is greater than or equal to the preset value are determined as the traffic data packets in the valid traffic samples. 4.根据权利要求2所述的方法,其特征在于,所述根据预先获取的App的第二内容特征和所述原生流量,获得所述相应App的第二行为特征,包括:4. The method according to claim 2, wherein, obtaining the second behavioral feature of the corresponding App according to the second content feature of the App and the native traffic obtained in advance, comprising: 对所述原生流量进行内容特征提取,获得所述原生流量的内容特征;performing content feature extraction on the native traffic to obtain content features of the native traffic; 将所述原生流量的内容特征和预先获取的App的第二内容特征进行匹配,确定所述原生流量关联的App;Matching the content feature of the native traffic with the second content feature of the pre-acquired App to determine the App associated with the native traffic; 根据所述App的描述信息和所述原生流量关联的App,获得所述第二行为特征。The second behavior characteristic is obtained according to the description information of the App and the App associated with the native traffic. 5.根据权利要求1所述的方法,其特征在于,所述根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App,包括:5. The method according to claim 1, wherein, according to the second behavior feature of the at least one App in the feature database, it is determined from the at least one App that the traffic data packet to be identified belongs to , including: 从所述至少一个App的第二行为特征中获取目标行为特征,所述目标行为特征中的调用行为关联的App中包括所述待识别流量数据包关联的所有App;Obtain the target behavior feature from the second behavior feature of the at least one App, and the App associated with the calling behavior in the target behavior feature includes all the Apps associated with the traffic data packet to be identified; 从所述特征数据库中获取所述目标行为特征对应的App;Obtain the App corresponding to the target behavior feature from the feature database; 将所述目标行为特征对应的App确定为所述待识别流量数据包所属的App。The App corresponding to the target behavior feature is determined as the App to which the to-be-identified traffic data packet belongs. 6.一种流量识别装置,其特征在于,所述装置包括:处理器、用于存储能够在处理器上运行的计算机程序的存储器;其中,6. A flow identification device, characterized in that the device comprises: a processor, a memory for storing a computer program that can be executed on the processor; wherein, 所述处理器用于运行所述计算机程序时,执行:When the processor is configured to run the computer program, execute: 获取电子设备被使用时所产生的待识别流量数据包;Obtain the traffic data packets to be identified generated when the electronic device is used; 对所述待识别流量数据包进行特征提取,获得第一内容特征;所述第一内容特征表征待识别流量数据包关联的应用程序App的内容特征;Perform feature extraction on the traffic data packet to be identified to obtain a first content feature; the first content feature represents the content feature of an application program App associated with the traffic data packet to be identified; 将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;所述第二内容特征为从App的有效流量样本中提取的内容特征,所述有效流量样本包括App调用本地数据库所产生的流量数据包;Match the first content feature with the second content feature of each App in the pre-built feature database, and determine at least one App associated with the traffic data packet to be identified; the second content feature is the valid traffic from the App The content feature extracted from the sample, the effective flow sample includes the flow data packet generated by the App calling the local database; 根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App;所述第二行为特征表征App对第三方数据库的调用行为特征。According to the second behavior feature of the at least one App in the feature database, the App to which the traffic data packet to be identified belongs is determined from the at least one App; the second behavior feature represents the call of the App to the third-party database Behavioral characteristics. 7.根据权利要求6所述的装置,其特征在于,所述处理器用于运行所述计算机程序时,执行:7. The apparatus according to claim 6, wherein when the processor is configured to run the computer program, execute: 针对每个App,获取相应的原生流量,所述原生流量为相应App在模拟环境运行过程中所产生的流量;For each App, obtain the corresponding native traffic, where the native traffic is the traffic generated by the corresponding App during the running process of the simulated environment; 对所述原生流量进行过滤,获得所述相应App的有效流量样本;Filtering the native traffic to obtain valid traffic samples of the corresponding App; 对所述有效流量样本进行内容特征提取,获得所述相应App的第二内容特征;Perform content feature extraction on the valid traffic sample to obtain the second content feature of the corresponding App; 根据预先获取的App的第二内容特征和所述原生流量,获得所述相应App的第二行为特征;Obtain the second behavioral feature of the corresponding App according to the pre-acquired second content feature of the App and the native traffic; 利用各App的第二内容特征和第二行为特征,构建所述特征数据库。The feature database is constructed by using the second content feature and the second behavior feature of each App. 8.根据权利要求7所述的装置,其特征在于,所述处理器用于运行所述计算机程序时,执行:8. The apparatus according to claim 7, wherein when the processor is configured to run the computer program, execute: 对所述App的第N个特征信息进行语义分析,获得第一语义信息,所述N为正整数;Semantic analysis is performed on the Nth feature information of the App to obtain the first semantic information, and the N is a positive integer; 对待分析流量数据包中与所述第N个特征信息对应的内容进行语义分析,获得所述待分析流量数据包的第二语义信息;所述待分析流量数据包为所述原生流量中通过N-1个特征信息进行过滤后未确定为有效流量数据包的流量数据包;所述有效流量数据包为所述有效流量样本中的流量数据包;Perform semantic analysis on the content corresponding to the Nth feature information in the traffic data packet to be analyzed, and obtain the second semantic information of the traffic data packet to be analyzed; the traffic data packet to be analyzed is the original traffic through N -1 traffic data packet that is not determined to be an effective traffic data packet after filtering the characteristic information; the effective traffic data packet is the traffic data packet in the effective traffic sample; 计算所述第一语义信息与所述第二语义信息之间的关联度;calculating the degree of association between the first semantic information and the second semantic information; 将所述关联度大于或等于预设值的待分析流量数据包确定为有效流量样本中的流量数据包。The traffic data packets to be analyzed whose correlation degree is greater than or equal to the preset value are determined as the traffic data packets in the valid traffic samples. 9.根据权利要求7所述的装置,其特征在于,所述处理器用于运行所述计算机程序时,执行:9. The apparatus according to claim 7, wherein when the processor is configured to run the computer program, execute: 对所述原生流量进行内容特征提取,获得所述原生流量的内容特征;performing content feature extraction on the native traffic to obtain content features of the native traffic; 将所述原生流量的内容特征和预先获取的App的第二内容特征进行匹配,确定所述原生流量关联的App;Matching the content feature of the native traffic with the second content feature of the pre-acquired App to determine the App associated with the native traffic; 根据所述App的描述信息和所述原生流量关联的App,获得所述第二行为特征。The second behavior characteristic is obtained according to the description information of the App and the App associated with the native traffic. 10.根据权利要求6所述的装置,其特征在于,所述处理器用于运行所述计算机程序时,执行:10. The apparatus according to claim 6, wherein when the processor is configured to run the computer program, execute: 从所述至少一个App的第二行为特征中获取目标行为特征,所述目标行为特征中的调用行为关联的App中包括所述待识别流量数据包关联的所有App;Obtain the target behavior feature from the second behavior feature of the at least one App, and the App associated with the calling behavior in the target behavior feature includes all the Apps associated with the traffic data packet to be identified; 从所述特征数据库中获取所述目标行为特征对应的App;Obtain the App corresponding to the target behavior feature from the feature database; 将所述目标行为特征对应的App确定为所述待识别流量数据包所属的App。The App corresponding to the target behavior feature is determined as the App to which the to-be-identified traffic data packet belongs. 11.一种流量识别装置,其特征在于,所述装置包括:获取模块、特征提取模块、第一确定模块和第二确定模块;其中,11. A flow identification device, characterized in that the device comprises: an acquisition module, a feature extraction module, a first determination module and a second determination module; wherein, 所述获取模块,用于获取电子设备被使用时所产生的待识别流量数据包;The acquisition module is used to acquire the traffic data packets to be identified generated when the electronic device is used; 所述特征提取模块,用于对所述待识别流量数据包进行特征提取,获得第一内容特征;所述第一内容特征表征待识别流量数据包关联的应用程序App的内容特征;The feature extraction module is configured to perform feature extraction on the traffic data packet to be identified to obtain a first content feature; the first content feature represents the content feature of an application program associated with the traffic data packet to be identified; 所述第一确定模块,用于将所述第一内容特征与预先构建的特征数据库中各App的第二内容特征进行匹配,确定所述待识别流量数据包关联的至少一个App;所述第二内容特征为从App的有效流量样本中提取的内容特征,所述有效流量样本包括App调用本地数据库所产生的流量数据包;The first determining module is configured to match the first content feature with the second content feature of each App in the pre-built feature database, and determine at least one App associated with the traffic data packet to be identified; the first The second content feature is the content feature extracted from the valid flow sample of the App, and the valid flow sample includes the flow data packet generated by the App calling the local database; 所述第二确定模块,用于根据所述特征数据库中所述至少一个App的第二行为特征,从所述至少一个App中确定所述待识别流量数据包所属的App;所述第二行为特征表征App对第三方数据库的调用行为特征。The second determination module is configured to determine the App to which the traffic data packet to be identified belongs from the at least one App according to the second behavior feature of the at least one App in the feature database; the second behavior The characteristics represent the behavior characteristics of the app calling the third-party database. 12.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机可执行指令,所述计算机可执行指令的计算机程序被处理器执行时实现权利要求1至5任一项所述的流量识别方法。12. A computer-readable storage medium, wherein computer-executable instructions are stored in the computer-readable storage medium, and when a computer program of the computer-executable instructions is executed by a processor, any one of claims 1 to 5 is implemented. A traffic identification method as described in one of the above.
CN201810000615.2A 2018-01-02 2018-01-02 A kind of traffic identification method, device and computer readable storage medium Active CN109995605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810000615.2A CN109995605B (en) 2018-01-02 2018-01-02 A kind of traffic identification method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810000615.2A CN109995605B (en) 2018-01-02 2018-01-02 A kind of traffic identification method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109995605A true CN109995605A (en) 2019-07-09
CN109995605B CN109995605B (en) 2021-04-13

Family

ID=67128223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810000615.2A Active CN109995605B (en) 2018-01-02 2018-01-02 A kind of traffic identification method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109995605B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110677313A (en) * 2019-08-25 2020-01-10 北京亚鸿世纪科技发展有限公司 Method for discovering VPN software background server
CN112532616A (en) * 2020-11-26 2021-03-19 杭州迪普科技股份有限公司 Feature analysis method and device for network application

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101202652A (en) * 2006-12-15 2008-06-18 北京大学 Device and method for classifying and identifying network application traffic
CN103297440A (en) * 2013-06-24 2013-09-11 北京星网锐捷网络技术有限公司 Method, device and network equipment for establishing application traffic feature library
US8995459B1 (en) * 2007-09-07 2015-03-31 Meru Networks Recognizing application protocols by identifying message traffic patterns
CN105099802A (en) * 2014-05-15 2015-11-25 中国移动通信集团公司 Traffic identification method, terminal, and network element equipment
CN106815521A (en) * 2015-12-31 2017-06-09 武汉安天信息技术有限责任公司 A kind of sample relevance detection method, system and electronic equipment
CN106998262A (en) * 2016-10-10 2017-08-01 深圳汇网天下科技有限公司 A kind of System and method for for recognizing Internet user
CN107291697A (en) * 2017-06-29 2017-10-24 浙江图讯科技股份有限公司 A kind of semantic analysis, electronic equipment, storage medium and its diagnostic system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101202652A (en) * 2006-12-15 2008-06-18 北京大学 Device and method for classifying and identifying network application traffic
US8995459B1 (en) * 2007-09-07 2015-03-31 Meru Networks Recognizing application protocols by identifying message traffic patterns
CN103297440A (en) * 2013-06-24 2013-09-11 北京星网锐捷网络技术有限公司 Method, device and network equipment for establishing application traffic feature library
CN105099802A (en) * 2014-05-15 2015-11-25 中国移动通信集团公司 Traffic identification method, terminal, and network element equipment
CN106815521A (en) * 2015-12-31 2017-06-09 武汉安天信息技术有限责任公司 A kind of sample relevance detection method, system and electronic equipment
CN106998262A (en) * 2016-10-10 2017-08-01 深圳汇网天下科技有限公司 A kind of System and method for for recognizing Internet user
CN107291697A (en) * 2017-06-29 2017-10-24 浙江图讯科技股份有限公司 A kind of semantic analysis, electronic equipment, storage medium and its diagnostic system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李吉宗: "《山东大学硕士学位论文,基于特征库识别的流量监控系统的设计与实现》", 15 October 2014 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110677313A (en) * 2019-08-25 2020-01-10 北京亚鸿世纪科技发展有限公司 Method for discovering VPN software background server
CN112532616A (en) * 2020-11-26 2021-03-19 杭州迪普科技股份有限公司 Feature analysis method and device for network application

Also Published As

Publication number Publication date
CN109995605B (en) 2021-04-13

Similar Documents

Publication Publication Date Title
CN111092852B (en) Network security monitoring method, device, equipment and storage medium based on big data
CN110417778B (en) Access request processing method and device
CN109933984B (en) Optimal clustering result screening method and device and electronic equipment
WO2017084586A1 (en) Method , system, and device for inferring malicious code rule based on deep learning method
CN113132311B (en) Abnormal access detection method, device and equipment
CN110351248B (en) Safety protection method and device based on intelligent analysis and intelligent current limiting
CN112839014A (en) Method, system, device and medium for establishing a model for identifying abnormal visitor
Yang et al. Power consumption based android malware detection
CN117056966A (en) System for analyzing consistency of applet privacy policy and authority call
CN114528457A (en) Web fingerprint detection method and related equipment
CN109995605B (en) A kind of traffic identification method, device and computer readable storage medium
CN118194277A (en) Privacy behavior consistency analysis method, device and medium based on APP usage scene
CN112671724A (en) Terminal security detection analysis method, device, equipment and readable storage medium
CN115080972A (en) A method and device for detecting abnormal access to an interface of a power mobile terminal
CN108897739B (en) An intelligent method and system for automatic mining of application traffic identification features
CN115587364A (en) Firmware vulnerability input point location method and device based on front-end and back-end correlation analysis
CN110866700A (en) Method and device for determining enterprise employee information disclosure source
CN105978722A (en) User attribute mining method and device
CN114978674A (en) Crawler identification enhancement method and device, storage medium and electronic equipment
CN111125704B (en) Webpage Trojan horse recognition method and system
CN109271781B (en) Method and system for detecting super authority obtaining behavior of application program based on kernel
CN114006766B (en) Network attack detection method, device, electronic device and readable storage medium
CN113890866B (en) Illegal application software identification method, device, medium and electronic equipment
CN115563617A (en) Source code vulnerability detection method and device
CN114579711A (en) Identification method, device, device and storage medium of fraudulent application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant