[go: up one dir, main page]

WO2016184163A1 - Method and device for generating a dpi rules - Google Patents

Method and device for generating a dpi rules Download PDF

Info

Publication number
WO2016184163A1
WO2016184163A1 PCT/CN2016/072175 CN2016072175W WO2016184163A1 WO 2016184163 A1 WO2016184163 A1 WO 2016184163A1 CN 2016072175 W CN2016072175 W CN 2016072175W WO 2016184163 A1 WO2016184163 A1 WO 2016184163A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
feature
dpi
unidentified
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/072175
Other languages
French (fr)
Chinese (zh)
Inventor
胡斓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Publication of WO2016184163A1 publication Critical patent/WO2016184163A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks

Definitions

  • This document relates to, but is not limited to, the field of network data transmission technologies, and in particular, to a method, an apparatus, and a computer readable storage medium for generating DPI rules.
  • DPI Deep Packet Inspection
  • DPI Deep Packet Inspection
  • the service applications of the mobile Internet are endless, and the version of the same application is frequently updated.
  • the DPI rules based on the known service identification of the current DPI rule base cannot meet the needs of service analysis, and the DPI rules cannot accurately identify the service data in the Internet data. .
  • Embodiments of the present invention provide a method, an apparatus, and a computer readable storage medium for generating a DPI rule, which are to solve the technical problem that the DPI rule of the DPI rule base cannot accurately identify the service data in the Internet data.
  • a method for generating a DPI rule according to an embodiment of the present invention includes the following steps:
  • the new DPI rules are stored to the DPI rule base.
  • the step of analyzing the unidentified data to obtain the first feature of the unidentified data when the Internet data has unidentified data includes:
  • the acquired service feature is taken as the first feature of the unidentified data.
  • the step of acquiring the second feature of the unidentified data and the step of filtering the unidentified data based on the second feature to obtain the service data further includes:
  • the second user data is used to update the first user data in the unidentified data.
  • the step of analyzing the service data to obtain the service feature of the service data includes:
  • the plurality of the obtained service features are taken as the first feature of the unidentified data.
  • the step of storing the new DPI rule to the DPI rule base includes:
  • the new DPI rule is stored to the DPI rule base.
  • the embodiment of the present invention further provides a device for generating a DPI rule, where the device for generating the DPI rule includes:
  • An identification module configured to obtain Internet data, and identify the Internet data based on an existing DPI rule in a DPI rule base;
  • An analysis module configured to analyze the unidentified data to obtain a first feature of the unidentified data when the Internet data has unidentified data, wherein the unidentified data is the Internet data Internet data that is not recognized by DPI rules;
  • a storage module configured to store the new DPI rule to the DPI rule base.
  • the analyzing module includes:
  • a first acquiring unit configured to acquire a second feature of the unidentified data when the Internet data has unidentified data
  • a filtering unit configured to filter the unidentified data based on the second feature to obtain service data
  • the analyzing unit is configured to analyze the service data to obtain a service feature of the service data, and use the acquired service feature as the first feature of the unidentified data.
  • the analyzing module further includes:
  • a second acquiring unit configured to be a first target IP address and/or first user data corresponding to the second feature
  • an update unit configured to: when the first target IP address is inconsistent with the second target IP address corresponding to the second feature, use the second target IP to update the first target IP address in the unidentified data; And/or for updating the first user data in the unidentified data by using the second user data when the first user data is inconsistent with the second user data corresponding to the second feature.
  • the analyzing unit includes:
  • a grouping subunit configured to divide the service data into multiple groups of service data groups
  • a data mining sub-unit configured to perform data mining on the same sequence of load packets of the service data group, to obtain service features of the plurality of service data groups, and to obtain the plurality of the service features as the The first feature of the data is not identified.
  • the storage module includes:
  • a determining unit configured to determine whether the generated new DPI rule conflicts with an existing DPI rule of the DPI rule base
  • the storage unit is configured to store the new DPI rule to the DPI rule base when the generated new DPI rule does not conflict with the existing DPI rule of the DPI rule base.
  • the embodiment of the present invention further provides a computer readable storage medium, which stores program instructions, and when the program instructions are executed by the processor, implements a DPI rule generation method provided by an embodiment of the present invention.
  • the embodiment of the present invention firstly obtains the Internet data by acquiring the Internet data, and the existing DPI rule based on the DPI rule base is used to obtain the unidentified data; and then analyzing the unidentified data to obtain the service feature of the unidentified data; A new DPI rule is then generated based on the business feature compilation; finally the new DPI rule is stored to the DPI rule base.
  • Obtain unidentified data based on the obtained Internet data analyze the business characteristics of the unidentified data, compile and generate new DPI rules based on the business features, and update the DPI rule base based on the new DPI rules, and complete the real-time automatic update of the DPI rule base. Avoiding the DPI rule of the DPI rule base can not accurately identify the business data in the Internet data, and improve the recognition rate and accuracy of data recognition.
  • FIG. 1 is a schematic flow chart of a first embodiment of a method for generating a DPI rule in this document;
  • FIG. 2 is a schematic diagram of a refinement process of step S40 in FIG. 1;
  • step S20 of FIG. 1 is a schematic flow chart of the refinement of the first embodiment of step S20 of FIG. 1;
  • step S20 of FIG. 1 is a schematic flow chart of the second embodiment of step S20 of FIG. 1;
  • FIG. 5 is a schematic diagram showing the refinement process of step S23 in FIG. 3;
  • FIG. 6 is a schematic diagram of functional modules of a first embodiment of a DPI rule generating apparatus
  • FIG. 7 is a schematic diagram of a refinement function module of the storage module in FIG. 6;
  • FIG. 8 is a schematic diagram of a refinement function module of the first embodiment of the analysis module of FIG. 6;
  • FIG. 9 is a schematic diagram of a refinement function module of the second embodiment of the analysis module of FIG. 6;
  • FIG. 10 is a schematic diagram of a refinement function module of the analysis unit of FIG. 8.
  • FIG. 10 is a schematic diagram of a refinement function module of the analysis unit of FIG. 8.
  • FIG. 1 is a schematic flowchart diagram of a first embodiment of a method for generating a DPI rule.
  • the method for generating the DPI rule includes:
  • Step S10 Obtain internet data, and identify the internet data based on existing DPI rules in the DPI rule base;
  • Collect/acquire mobile internet data and identify the collected/acquired mobile internet data based on existing DPI rules in the DPI rule base.
  • Step S20 analyzing the unidentified data to obtain the first feature of the unidentified data when the Internet data has unidentified data, wherein the unidentified data is based on the existing data in the Internet data.
  • Internet data that is not recognized by the DPI rule;
  • the unidentified data is analyzed to obtain the first feature of the unidentified data.
  • the first feature may be acquired based on a data mining algorithm using features included in the intrinsic feature set, load of unidentified data, and/or common features of the same sequence of most data streams in unidentified data.
  • the first feature is a feature included in the intrinsic feature set, a load of unidentified data, and/or one or more of the common features of the same sequence of the majority of the data streams in the unidentified data; wherein the load refers to the unidentified data inclusion Corresponding server and other encrypted data sequences. For example: access Sohu network, through the general way to resolve the HTTP protocol header field fixed field HOST search out the first feature is www.sohu.com .
  • Step S30 compiling and generating a new DPI rule based on the first feature
  • a new DPI rule is generated based on the acquired first feature compilation.
  • the compiling method may adopt the compiling method of the existing DPI rules, or may be other compiling manners, for example, the compiling manner of the existing DPI rules in the compiling manner is optimized, and is not further limited in this embodiment.
  • Step S40 storing the newly generated DPI rule to the DPI rule base.
  • the new DPI rules generated by the compilation are stored in the DPI rule base, that is, the DPI rule base is updated based on the new DPI rules generated by the compilation, and the update process adopts a hot update.
  • step S40 includes:
  • Step S41 Determine whether the generated DPI rule conflicts with an existing DPI rule of the DPI rule base.
  • Step S42 When the generated DPI rule does not conflict with the existing DPI rule of the DPI rule base, the DPI rule is stored in the DPI rule base.
  • the generated DPI rule stores the modified newly generated DPI rule to the DPI rule base when there is no conflict between the modified newly generated DPI rule and the existing DPI rule of the DPI rule base.
  • the priority of the newly generated DPI rule and the existing DPI rule of the DPI rule base is set, and The DPI rule base is updated based on the newly generated DPI rules after the priority is set and the existing DPI rules of the DPI rule base; if the modified newly generated DPI rules still conflict with the existing DPI rules of the DPI rule base, the modification continues or Discard the DPI rules.
  • the DPI rule of the updated DPI rule base can be used to identify the newly appearing business data in the mobile internet.
  • the method for generating a DPI rule in this embodiment firstly identifies the Internet data by acquiring an Internet data according to an existing DPI rule of the DPI rule base; and then analyzing the unidentified data when the Internet data has unidentified data, Obtaining a first feature of the unrecognized data; then generating a new DPI rule based on the first feature compilation; and finally storing the new DPI rule to the DPI rule base.
  • Obtain unidentified data based on the obtained Internet data analyze the business characteristics of the unidentified data, compile and generate new DPI rules based on the business features, and update the DPI rule base based on the new DPI rules, and complete the real-time automatic update of the DPI rule base. Avoiding the DPI rule of the DPI rule base can not accurately identify the business data in the Internet data, and improve the recognition rate and accuracy of data recognition.
  • FIG. 3 is a schematic diagram of a refinement process of the first embodiment of step S20 of FIG.
  • step S20 includes:
  • Step S21 Acquire a second feature of the unidentified data when the Internet data has unidentified data
  • the second feature includes a plaintext data feature of unidentified data such as a domain name, and the second feature is used as a service name corresponding to the unidentified data.
  • Step S22 filtering the unidentified data based on the second feature to obtain service data.
  • the unidentified data corresponding to the user data may be the service when the user data and the target IP address corresponding to the user data are successfully matched with the second feature by acquiring the user data corresponding to the user data and the target IP address corresponding to the user data. Data; when the target IP address corresponding to the user data and the user data cannot be successfully matched with the second feature at the same time, the unidentified data corresponding to the user data is non-service data.
  • the manner of filtering unidentified data in the present embodiment may delete or mark non-service data in the unidentified data as non-service data.
  • Step S23 analyzing the service data to obtain a service feature of the service data
  • the business characteristics of the business data may be analyzed based on a data mining algorithm using features included in the intrinsic feature set, loads of unidentified data, and/or common features of the same sequence of most data streams in unidentified data.
  • the service feature is a feature included in the intrinsic feature set, a load of unidentified data, and/or one or more of the common features of the same sequence of most data streams in the unidentified data; wherein the load refers to the unidentified data included Corresponding server and other encrypted data sequences.
  • Step S24 the acquired service feature is used as the first feature of the unidentified data.
  • the method for generating the DPI rule further includes:
  • Step S25 Obtain a first target IP address and/or first user data corresponding to the second feature.
  • the first target IP address and the first user data are respectively a target IP address and user data corresponding to the second feature in the unidentified data.
  • Step S26 when the first target IP address is inconsistent with the second target IP address corresponding to the second feature, using the second target IP to update the first target IP address in the unidentified data;
  • the second user data is used to update the first user data in the unidentified data.
  • the second target IP address is a target IP address corresponding to the second feature in the Internet data
  • the second user data is user data corresponding to the second feature in the Internet data.
  • the second user data updates the first user data in the unidentified data. Complementing the unidentified data ensures the integrity of the unidentified data corresponding to the second feature, thereby improving the accuracy of the subsequently generated DPI rules.
  • the first target IP address corresponding to the acquired second feature is A
  • the application version of the service X is upgraded
  • the acquired Internet data may include the second feature corresponding to the second feature.
  • the second target IP address is data of B; wherein address B is a new target IP address in the data generated by the upgraded version, and address A is the target IP address in the data generated by the old version before the upgrade.
  • the addresses A and B are inconsistent, and the second target IP address B is used to update the first target IP address A in the unidentified data; and the second target IP address B is added to the first target IP address, and the updated The first target IP address corresponding to the second feature is A and B; or, the first target IP address is updated to B.
  • the first user data corresponding to the acquired second feature is C
  • the application version of the service Y is upgraded
  • the acquired Internet data may include the second feature corresponding to the second feature.
  • the user data is the data of D; wherein the user data D is new user data in the data generated by the upgraded version, and the user data C is the user data in the data generated by the old version before the upgrade.
  • the first user data C in the unidentified data is updated by using the second user data D; including: adding the second user data D to the first user data, and the updated first user data For C and D; or, update the first user data to D.
  • the unidentified data is filtered by the second feature, the service data in the unidentified data is obtained, and the service feature of the service data is obtained by analyzing the service data, and the obtained service feature is used as
  • the first feature of the unidentified data improves the accuracy of the first feature, thereby improving the accuracy of the subsequently generated DPI rule.
  • FIG. 5 is a schematic diagram of the refinement process of step S23 in FIG.
  • step S23 includes:
  • Step S231 dividing the service data into multiple groups of service data groups
  • the service data is divided into multiple groups of service data groups in units of user flows, and the service data may be grouped by a group of N user flows.
  • User flow refers to the data flow generated during the process of connecting a server to an IP address when the user accesses a server.
  • Step S232 Perform data mining on the same sequence of load packets of the service data group to obtain service features of the plurality of service data groups.
  • the data mining algorithm is used to perform data mining on the same sequence of load packets of each service data group, so as to obtain the service features of each of the service data groups, where the service feature refers to a service data group capable of covering a preset ratio or more.
  • the common feature of the service data that is, the common feature of the service data above the preset ratio of the service data group, or the service data of the service data group preset ratio or more includes the service feature, wherein the preset ratio is to ensure the subsequent generation of the DPI rule.
  • the accuracy ratio and the preset ratio, the preset ratio can be set to 90%, 95%, etc. according to the demand.
  • Step S233 the obtained plurality of the service features are used as the first feature of the unidentified data.
  • the service data is divided into multiple groups of service data groups; data mining of the same sequence of load packets of the service data group is performed, to obtain the service characteristics of the service data group, and The obtained service feature is used as the first feature of the unidentified data; the service feature of the service data is obtained by using the data mining method, and the accuracy of the service feature is improved.
  • a device for generating DPI rules A device for generating DPI rules.
  • FIG. 6 is a schematic diagram of functional modules of a first embodiment of a DPI rule generating apparatus.
  • the generating device of the DPI rule includes:
  • the identification module 10 is configured to acquire Internet data, and identify the Internet data based on a DPI rule of the DPI rule base;
  • the identification module 10 collects/acquires mobile internet data, and the DPI rule set as the DPI rule base identifies the collected/acquired mobile internet data.
  • the analyzing module 20 is configured to analyze the unidentified data to obtain the first feature of the unidentified data when the Internet data has unidentified data, wherein the unidentified data is in the Internet data Internet data that is not recognized by DPI rules;
  • the analyzing module 20 is configured to analyze the unidentified data to obtain the first feature of the unidentified data.
  • the analysis module 20 is configured to employ features included in the intrinsic feature set, load of unidentified data, and/or phase of most data streams in unidentified data.
  • the first feature is acquired based on a data mining algorithm, such as a common feature possessed by the same sequence.
  • the first feature is a feature included in the intrinsic feature set, a load of unidentified data, and/or one or more of the common features of the same sequence of the majority of the data streams in the unidentified data; wherein the load refers to the unidentified data inclusion Corresponding server and other encrypted data sequences.
  • the compiling module 30 is configured to generate a DPI rule based on the first feature compilation
  • the compiling module 30 is configured to generate a DPI rule based on the acquired first feature compilation.
  • the compiling method may adopt the compiling method of the existing DPI rules, and may also adopt other compiling methods, such as the compiling mode optimized by the existing DPI rules, which is not further limited in this embodiment.
  • the storage module 40 is configured to store the newly generated DPI rule to the DPI rule base.
  • the storage module 40 stores the new DPI rules generated by the compilation into the DPI rule base, that is, updates the DPI rule base based on the new DPI rules generated by the compilation, and the update process adopts a hot update.
  • the storage module 40 includes:
  • the determining unit 41 is configured to determine whether the generated new DPI rule conflicts with an existing DPI rule of the DPI rule base;
  • the storage unit 42 is configured to store the newly generated DPI rule to the DPI rule base when there is no conflict between the newly generated DPI rule and the existing DPI rule of the DPI rule base.
  • the data searched based on the newly generated DPI rule is the same as the data searched by the existing DPI rule based on the DPI rule base or the data searched based on the newly generated DPI rule contains or is included in the data of the existing DPI rule search based on the DPI rule base.
  • the judging unit 41 determines that the newly generated DPI rule conflicts with the existing DPI rule of the DPI rule base.
  • the generated DPI rule stores the modified newly generated DPI rule to the DPI rule base when the modified newly generated DPI rule does not conflict with the existing DPI rule of the DPI rule base, wherein the newly generated DPI rule is based on the newly generated DPI rule
  • the searched data contains or is included in the data of the existing DPI rule search based on the DPI rule base
  • the priority of the newly generated DPI rule and the existing DPI rule of the DPI rule base is set, and the newly generated DPI rule is set based on the setting.
  • the device for generating DPI rules in this embodiment first acquires Internet data through the identification module 10, The existing DPI rules based on the DPI rule base identify the Internet data; then, when the Internet data has unidentified data, the analysis module 20 analyzes the unidentified data to obtain the first feature of the unidentified data; The compiling module 30 compiles a new DPI rule based on the first feature compilation; the last storage module 40 stores the newly generated DPI rule to the DPI rule base.
  • the unidentified data is obtained, the unidentified data is analyzed, the DPI rule is compiled based on the service feature, and the DPI rule base is updated based on the newly generated DPI rule, and the real-time automatic update of the DPI rule base is completed, thereby avoiding
  • the DPI rule of the DPI rule base cannot accurately identify the problem of business data in the Internet data, and improves the recognition rate and accuracy of data recognition.
  • FIG. 8 is a schematic diagram of a refinement function module of the first embodiment of the analysis module of FIG. 6.
  • the analysis module 20 includes:
  • the first obtaining unit 21 is configured to acquire a second feature of the unidentified data when the Internet data has unidentified data;
  • the first acquiring unit 21 is configured to acquire the second feature of the unidentified data by using an existing protocol based on the plaintext data of the unidentified data, and the second feature includes the plaintext data feature of the unidentified data such as the domain name, and the second feature The feature is the business name corresponding to the unidentified data.
  • the filtering unit 22 is configured to filter the unidentified data based on the second feature to obtain service data
  • the filtering unit 22 is configured to filter unidentified data, remove non-service data of unidentified data, and ensure that the remaining unidentified data is pure service data.
  • the unidentified data corresponding to the user data may be the service when the user data and the target IP address corresponding to the user data are successfully matched with the second feature by acquiring the user data corresponding to the user data and the target IP address corresponding to the user data. Data; when the target IP address corresponding to the user data and the user data cannot be successfully matched with the second feature at the same time, the unidentified data corresponding to the user data is non-service data.
  • the manner of filtering unidentified data in this embodiment may delete or mark non-service data in the unidentified data as non-service data.
  • the analyzing unit 23 is configured to analyze the service data to obtain a service feature of the service data, and use the acquired service feature as the first feature of the unidentified data.
  • the analyzing unit 23 is configured to adopt the features included in the intrinsic feature set, and the unidentified data.
  • the common features of the same sequence of most data streams in the payload and/or unidentified data, etc. are based on data mining algorithms to analyze the business characteristics of the business data.
  • the service feature is a feature included in the intrinsic feature set, a load of unidentified data, and/or one or more of the common features of the same sequence of most data streams in the unidentified data; wherein the load refers to the unidentified data included Corresponding server and other encrypted data sequences.
  • FIG. 9 is a schematic diagram of a refinement function module of the second embodiment of the analysis module of FIG. 6.
  • the analysis module 20 further includes:
  • the second obtaining unit 24 is configured to acquire a first target IP address and/or first user data corresponding to the second feature
  • the first target IP address and the first user data are respectively a target IP address and user data corresponding to the second feature in the unidentified data.
  • the updating unit 25 is configured to update, by using the second target IP, the first target IP address in the unidentified data when the first target IP address is inconsistent with the second target IP address corresponding to the second feature And/or, when the first user data is inconsistent with the second user data corresponding to the second feature, the first user data in the unidentified data is updated by using the second user data.
  • the second target IP address is a target IP address corresponding to the second feature in the Internet data
  • the second user data is user data corresponding to the second feature in the Internet data.
  • the updating unit 25 is configured to update the first target IP address in the unidentified data by using the second target IP, and/or, in the first user data, corresponding to the second feature.
  • the updating unit 25 is configured to update the first user data in the unidentified data with the second user data. Complementing the unidentified data ensures the integrity of the unidentified data corresponding to the second feature, thereby improving the accuracy of the subsequently generated DPI rules.
  • the filtering unit 22 is configured to filter the unidentified data based on the second feature to obtain service data in the unidentified data, and obtain the service feature of the service data by analyzing the service data, and obtain the service feature.
  • the service feature as a first feature of the unidentified data, The accuracy of the first feature is improved, thereby improving the accuracy of the subsequently generated DPI rules.
  • FIG. 10 is a schematic diagram of a refinement function module of the analysis unit of FIG.
  • the analysis unit 23 includes:
  • the grouping subunit 231 is configured to divide the service data into multiple groups of service data groups
  • the grouping subunit 231 is configured to divide the service data into at least two sets of service data groups in units of user streams, and may group the service data in groups of N user streams.
  • User flow refers to all data in the access process when a user successfully accesses a server IP.
  • the data mining sub-unit 232 is configured to perform data mining on the same sequence of load packets of the service data group, to obtain service features of the plurality of service data groups, and obtain the plurality of the service features as the The first feature of the unrecognized data is described.
  • the data mining sub-unit 232 is configured to perform data mining on the same sequence of load packets of each service data group by using a data mining algorithm to obtain a service feature of the service data group, where the service feature is capable of covering a preset ratio.
  • the common feature of the service data of the foregoing service data group that is, the common feature of the service data above the preset ratio of the service data group, or the service data of the service data group preset ratio or more includes the service feature, wherein the preset ratio is The ratio of the accuracy of the subsequently generated DPI rule is set in advance, and the preset ratio can be set to 90%, 95%, etc. according to requirements.
  • the grouping subunit 221 is configured to divide the service data into multiple groups of service data groups; the data mining subunit 222 performs data mining on the same sequence of load packets of the service data group to obtain multiple The service characteristics of the service data group; grouping and adopting data mining to obtain service characteristics of the service data, and improving the accuracy of the service feature.
  • the embodiment of the present invention further provides a computer readable storage medium, which stores program instructions, and when the program instructions are executed by the processor, can implement a DPI rule generation method provided by an embodiment of the present invention.
  • the method provided by the embodiment of the present invention acquires unidentified data according to the acquired Internet data, analyzes the service characteristics of the unidentified data, compiles and generates a new DPI rule based on the service feature, and updates the DPI rule base based on the newly generated DPI rule, and completes
  • the real-time automatic update of the DPI rule base avoids the problem that the DPI rule of the DPI rule base cannot accurately identify the business data in the Internet data, and improves the recognition rate and accuracy of the data recognition.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Disclosed is a method for generating a DPI rule, comprising: acquiring internet data, and identifying, based on the DPI rule of a DPI rule repository, the internet data; when there is unidentified data in the internet data, analyzing the unidentified data to acquire a first characteristic of the unidentified data; compiling, based on the first characteristic, to generate a DPI rule; and storing the DPI rule to the DPI rule repository.

Description

DPI规则的生成方法及装置Method and device for generating DPI rules 技术领域Technical field

本文涉及但不限于网络数据传输技术领域,尤其涉及一种DPI规则的生成方法、装置和计算机可读存储介质。This document relates to, but is not limited to, the field of network data transmission technologies, and in particular, to a method, an apparatus, and a computer readable storage medium for generating DPI rules.

背景技术Background technique

DPI(Deep Packet Inspection,深度业务识别)是一种对网络中不同的业务流进行区分的技术,DPI通过分析业务流中数据包的深度特征值和协议行为识别出数据属性及业务类型,通过不同客户、不同业务的标识为网络业务的精细化分析及控制提供支持。DPI (Deep Packet Inspection) is a technology for distinguishing different service flows in a network. DPI identifies data attributes and service types by analyzing the depth feature values and protocol behavior of data packets in the service flow. The identification of customers and different services provides support for the refined analysis and control of network services.

目前,移动互联网的业务应用层出不穷,且同一应用的版本更新频繁,导致当前DPI规则库的基于已知业务识别的DPI规则无法满足业务分析的需要,造成DPI规则无法准确识别互联网数据中的业务数据。At present, the service applications of the mobile Internet are endless, and the version of the same application is frequently updated. As a result, the DPI rules based on the known service identification of the current DPI rule base cannot meet the needs of service analysis, and the DPI rules cannot accurately identify the service data in the Internet data. .

发明内容Summary of the invention

以下是对本文详细描述的主题的概述,本概述并非是为了限制权利要求的保护范围。The following is a summary of the subject matter described in detail herein, and is not intended to limit the scope of the claims.

本发明实施例提供一种DPI规则的生成方法、装置和计算机可读存储介质,旨在解决DPI规则库的DPI规则无法准确识别互联网数据中的业务数据的技术问题。Embodiments of the present invention provide a method, an apparatus, and a computer readable storage medium for generating a DPI rule, which are to solve the technical problem that the DPI rule of the DPI rule base cannot accurately identify the service data in the Internet data.

本发明实施例提供的一种DPI规则的生成方法,所述DPI规则的生成方法包括以下步骤:A method for generating a DPI rule according to an embodiment of the present invention, the method for generating the DPI rule includes the following steps:

获取互联网数据,基于DPI规则库中的已有的DPI规则识别所述互联网数据;Obtaining internet data, identifying the internet data based on existing DPI rules in a DPI rule base;

在所述互联网数据存在未识别数据时,分析所述未识别数据,以获取所述未识别数据的第一特征,其中,所述未识别数据为所述互联网数据中所述DPI规则无法识别的互联网数据;When the Internet data has unidentified data, analyzing the unidentified data to obtain a first feature of the unidentified data, wherein the unidentified data is unrecognizable by the DPI rule in the Internet data. Internet data;

基于所述第一特征编译生成新的DPI规则; Generating a new DPI rule based on the first feature compilation;

将所述新的DPI规则存储至所述DPI规则库。The new DPI rules are stored to the DPI rule base.

可选地,所述在所述互联网数据存在未识别数据时,分析所述未识别数据,以获取所述未识别数据的第一特征的步骤包括:Optionally, the step of analyzing the unidentified data to obtain the first feature of the unidentified data when the Internet data has unidentified data includes:

在所述互联网数据存在未识别数据时,获取所述未识别数据的第二特征;Acquiring the second feature of the unidentified data when the Internet data has unidentified data;

基于所述第二特征过滤所述未识别数据,以获取业务数据;Filtering the unidentified data based on the second feature to obtain service data;

分析所述业务数据,以获取所述业务数据的业务特征;Analyzing the service data to obtain a service feature of the service data;

将获取的所述业务特征作为所述未识别数据的第一特征。The acquired service feature is taken as the first feature of the unidentified data.

可选地,在所述互联网数据存在未识别数据时,获取所述未识别数据的第二特征的步骤与所述基于所述第二特征过滤所述未识别数据,以获取业务数据的步骤之间,所述DPI规则的生成方法还包括:Optionally, when the Internet data has unidentified data, the step of acquiring the second feature of the unidentified data and the step of filtering the unidentified data based on the second feature to obtain the service data The method for generating the DPI rule further includes:

获取所述第二特征对应的第一目标IP地址和/或第一用户数据;Obtaining a first target IP address and/or first user data corresponding to the second feature;

在所述第一目标IP地址与所述第二特征对应的第二目标IP地址不一致时,采用所述第二目标IP更新所述未识别数据中的第一目标IP地址;And updating, by the second target IP, the first target IP address in the unidentified data when the first target IP address is inconsistent with the second target IP address corresponding to the second feature;

和/或,在所述第一用户数据与所述第二特征对应的第二用户数据不一致时,采用所述第二用户数据更新所述未识别数据中的所述第一用户数据。And/or, when the first user data is inconsistent with the second user data corresponding to the second feature, the second user data is used to update the first user data in the unidentified data.

可选地,所述分析所述业务数据,以获取所述业务数据的业务特征的步骤包括:Optionally, the step of analyzing the service data to obtain the service feature of the service data includes:

将所述业务数据分为多组业务数据组;Dividing the service data into multiple groups of service data groups;

对所述业务数据组的相同序列的载荷报文进行数据挖掘,以获取多个所述业务数据组的业务特征;Performing data mining on the same sequence of load packets of the service data group to obtain service characteristics of the plurality of service data groups;

将获取的多个所述业务特征作为所述未识别数据的第一特征。The plurality of the obtained service features are taken as the first feature of the unidentified data.

可选地,所述将所述新的DPI规则存储至DPI规则库的步骤包括:Optionally, the step of storing the new DPI rule to the DPI rule base includes:

判断生成的所述新的DPI规则与所述DPI规则库中的已有DPI规则是否冲突;Determining whether the generated new DPI rule conflicts with an existing DPI rule in the DPI rule base;

在生成的所述新的DPI规则与DPI规则库的已有DPI规则不存在冲突时,将所述新的DPI规则存储至DPI规则库。When the generated new DPI rule does not conflict with the existing DPI rule of the DPI rule base, the new DPI rule is stored to the DPI rule base.

此外,本发明实施例还提供一种DPI规则的生成装置,所述DPI规则的生成装置包括:In addition, the embodiment of the present invention further provides a device for generating a DPI rule, where the device for generating the DPI rule includes:

识别模块,设置为获取互联网数据,基于DPI规则库中的已有DPI规则识别所述互联网数据; An identification module configured to obtain Internet data, and identify the Internet data based on an existing DPI rule in a DPI rule base;

分析模块,设置为在所述互联网数据存在未识别数据时,分析所述未识别数据,以获取所述未识别数据的第一特征,其中,所述未识别数据为所述互联网数据中所述DPI规则无法识别的互联网数据;An analysis module, configured to analyze the unidentified data to obtain a first feature of the unidentified data when the Internet data has unidentified data, wherein the unidentified data is the Internet data Internet data that is not recognized by DPI rules;

编译模块,设置为基于所述第一特征编译生成新的DPI规则;Compiling a module, configured to generate a new DPI rule based on the first feature compilation;

存储模块,设置为将所述新的DPI规则存储至所述DPI规则库。a storage module configured to store the new DPI rule to the DPI rule base.

可选地,所述分析模块包括:Optionally, the analyzing module includes:

第一获取单元,设置为在所述互联网数据存在未识别数据时,获取所述未识别数据的第二特征;a first acquiring unit, configured to acquire a second feature of the unidentified data when the Internet data has unidentified data;

过滤单元,设置为基于所述第二特征过滤所述未识别数据,以获取业务数据;a filtering unit, configured to filter the unidentified data based on the second feature to obtain service data;

分析单元,设置为分析所述业务数据,以获取所述业务数据的业务特征,并将获取的所述业务特征作为所述未识别数据的第一特征。The analyzing unit is configured to analyze the service data to obtain a service feature of the service data, and use the acquired service feature as the first feature of the unidentified data.

可选地,所述分析模块还包括:Optionally, the analyzing module further includes:

第二获取单元,设置为所述第二特征对应的第一目标IP地址和/或第一用户数据;a second acquiring unit, configured to be a first target IP address and/or first user data corresponding to the second feature;

更新单元,设置为在所述第一目标IP地址与所述第二特征对应的第二目标IP地址不一致时,采用所述第二目标IP更新所述未识别数据中的第一目标IP地址;和/或,用于在所述第一用户数据与所述第二特征对应的第二用户数据不一致时,采用所述第二用户数据更新所述未识别数据中的所述第一用户数据。And an update unit, configured to: when the first target IP address is inconsistent with the second target IP address corresponding to the second feature, use the second target IP to update the first target IP address in the unidentified data; And/or for updating the first user data in the unidentified data by using the second user data when the first user data is inconsistent with the second user data corresponding to the second feature.

可选地,所述分析单元包括:Optionally, the analyzing unit includes:

分组子单元,设置为将所述业务数据分为多组业务数据组;a grouping subunit, configured to divide the service data into multiple groups of service data groups;

数据挖掘子单元,设置为对所述业务数据组的相同序列的载荷报文进行数据挖掘,以获取多个所述业务数据组的业务特征,并将获取的多个所述业务特征作为所述未识别数据的第一特征。a data mining sub-unit, configured to perform data mining on the same sequence of load packets of the service data group, to obtain service features of the plurality of service data groups, and to obtain the plurality of the service features as the The first feature of the data is not identified.

可选地,所述存储模块包括:Optionally, the storage module includes:

判断单元,设置为判断生成的所述新的DPI规则与所述DPI规则库的已有DPI规则是否冲突;a determining unit, configured to determine whether the generated new DPI rule conflicts with an existing DPI rule of the DPI rule base;

存储单元,设置为在生成的所述新的DPI规则与所述DPI规则库的已有DPI规则不存在冲突时,将所述新的DPI规则存储至DPI规则库。 The storage unit is configured to store the new DPI rule to the DPI rule base when the generated new DPI rule does not conflict with the existing DPI rule of the DPI rule base.

本发明实施例还提供一种计算机可读存储介质,存储有程序指令,当该程序指令被处理器执行时实现本发明实施例所提供的一种DPI规则的生成方法。The embodiment of the present invention further provides a computer readable storage medium, which stores program instructions, and when the program instructions are executed by the processor, implements a DPI rule generation method provided by an embodiment of the present invention.

本发明实施例首先通过获取互联网数据,基于DPI规则库的已有DPI规则识别所述互联网数据,以获取未识别数据;接着分析所述未识别数据,以获取所述未识别数据的业务特征;然后基于所述业务特征编译生成新的DPI规则;最后存储所述新的DPI规则至所述DPI规则库。根据获取到的互联网数据获取未识别数据、分析未识别数据的业务特征,基于业务特征编译生成新的DPI规则,并基于新的DPI规则更新DPI规则库,完成了DPI规则库的实时自动更新,避免出现DPI规则库的DPI规则无法准确识别互联网数据中的业务数据的问题,提高了数据识别的识别率和准确率。The embodiment of the present invention firstly obtains the Internet data by acquiring the Internet data, and the existing DPI rule based on the DPI rule base is used to obtain the unidentified data; and then analyzing the unidentified data to obtain the service feature of the unidentified data; A new DPI rule is then generated based on the business feature compilation; finally the new DPI rule is stored to the DPI rule base. Obtain unidentified data based on the obtained Internet data, analyze the business characteristics of the unidentified data, compile and generate new DPI rules based on the business features, and update the DPI rule base based on the new DPI rules, and complete the real-time automatic update of the DPI rule base. Avoiding the DPI rule of the DPI rule base can not accurately identify the business data in the Internet data, and improve the recognition rate and accuracy of data recognition.

在阅读并理解了附图和详细描述后,可以明白其他方面。Other aspects will be apparent upon reading and understanding the drawings and detailed description.

附图概述BRIEF abstract

图1为本文中DPI规则的生成方法第一实施例的流程示意图;1 is a schematic flow chart of a first embodiment of a method for generating a DPI rule in this document;

图2为图1中步骤S40的细化流程示意图;2 is a schematic diagram of a refinement process of step S40 in FIG. 1;

图3为图1中步骤S20第一实施例的细化流程示意图;3 is a schematic flow chart of the refinement of the first embodiment of step S20 of FIG. 1;

图4为图1中步骤S20第二实施例的细化流程示意图;4 is a schematic flow chart of the second embodiment of step S20 of FIG. 1;

图5为图3中步骤S23的细化流程示意图;FIG. 5 is a schematic diagram showing the refinement process of step S23 in FIG. 3;

图6为DPI规则的生成装置第一实施例的功能模块示意图;6 is a schematic diagram of functional modules of a first embodiment of a DPI rule generating apparatus;

图7为图6中存储模块的细化功能模块示意图;7 is a schematic diagram of a refinement function module of the storage module in FIG. 6;

图8为图6中分析模块第一实施例的细化功能模块示意图;8 is a schematic diagram of a refinement function module of the first embodiment of the analysis module of FIG. 6;

图9为图6中分析模块第二实施例的细化功能模块示意图;9 is a schematic diagram of a refinement function module of the second embodiment of the analysis module of FIG. 6;

图10为图8中分析单元的细化功能模块示意图。FIG. 10 is a schematic diagram of a refinement function module of the analysis unit of FIG. 8. FIG.

本发明的较佳实施方式 Preferred embodiment of the invention

参照图1,图1为DPI规则的生成方法第一实施例的流程示意图。Referring to FIG. 1, FIG. 1 is a schematic flowchart diagram of a first embodiment of a method for generating a DPI rule.

在本实施例中,该DPI规则的生成方法包括:In this embodiment, the method for generating the DPI rule includes:

步骤S10,获取互联网数据,基于DPI规则库中已有的DPI规则识别所述互联网数据;Step S10: Obtain internet data, and identify the internet data based on existing DPI rules in the DPI rule base;

采集/获取移动互联网数据,基于DPI规则库中已有的DPI规则识别采集/获取到的移动互联网数据。Collect/acquire mobile internet data, and identify the collected/acquired mobile internet data based on existing DPI rules in the DPI rule base.

步骤S20,在所述互联网数据存在未识别数据时,分析所述未识别数据,以获取所述未识别数据的第一特征,其中,所述未识别数据为所述互联网数据中根据已有的所述DPI规则无法识别的互联网数据;Step S20, analyzing the unidentified data to obtain the first feature of the unidentified data when the Internet data has unidentified data, wherein the unidentified data is based on the existing data in the Internet data. Internet data that is not recognized by the DPI rule;

在所述互联网数据存在未识别数据时,对未识别数据进行分析,用以获取未识别数据的第一特征。可以采用固有特征集包含的特征、未识别数据的负载和/或未识别数据中多数数据流的相同序列具有的共同特征等基于数据挖掘算法获取第一特征。该第一特征为固有特征集包含的特征、未识别数据的负载和/或未识别数据中多数数据流的相同序列具有的共同特征中的一种或几种;其中负载是指未识别数据包含的对应的服务器等加密数据序列。例如:访问搜狐网络,通过通用方式解析HTTP协议头域固定字段HOST搜索出来第一特征为www.sohu.comWhen the Internet data has unidentified data, the unidentified data is analyzed to obtain the first feature of the unidentified data. The first feature may be acquired based on a data mining algorithm using features included in the intrinsic feature set, load of unidentified data, and/or common features of the same sequence of most data streams in unidentified data. The first feature is a feature included in the intrinsic feature set, a load of unidentified data, and/or one or more of the common features of the same sequence of the majority of the data streams in the unidentified data; wherein the load refers to the unidentified data inclusion Corresponding server and other encrypted data sequences. For example: access Sohu network, through the general way to resolve the HTTP protocol header field fixed field HOST search out the first feature is www.sohu.com .

步骤S30,基于所述第一特征编译生成新的DPI规则;Step S30, compiling and generating a new DPI rule based on the first feature;

基于获取的第一特征编译生成新的DPI规则。编译方式可以采用现有DPI规则的编译方式,也可以采用其他编译方式,譬如,现有DPI规则的编译方式优化后的编译方式等,本实施例中不做进一步地限定。A new DPI rule is generated based on the acquired first feature compilation. The compiling method may adopt the compiling method of the existing DPI rules, or may be other compiling manners, for example, the compiling manner of the existing DPI rules in the compiling manner is optimized, and is not further limited in this embodiment.

步骤S40,将所述新生成的DPI规则存储至所述DPI规则库。Step S40, storing the newly generated DPI rule to the DPI rule base.

将编译生成的新的DPI规则存储至DPI规则库,即基于编译生成的新的DPI规则更新DPI规则库,其更新过程采用热更新。The new DPI rules generated by the compilation are stored in the DPI rule base, that is, the DPI rule base is updated based on the new DPI rules generated by the compilation, and the update process adopts a hot update.

在其他实施例中,请参考图2,步骤S40包括:In other embodiments, please refer to FIG. 2, and step S40 includes:

步骤S41、判断生成的所述DPI规则与所述DPI规则库的已有DPI规则是否冲突;Step S41: Determine whether the generated DPI rule conflicts with an existing DPI rule of the DPI rule base.

步骤S42、在生成的所述DPI规则与所述DPI规则库的已有DPI规则不存在冲突时,将所述DPI规则存储至DPI规则库。Step S42: When the generated DPI rule does not conflict with the existing DPI rule of the DPI rule base, the DPI rule is stored in the DPI rule base.

在基于新生成的DPI规则搜索的数据与基于DPI规则库的已有DPI规则 搜索的数据相同或基于新生成的DPI规则搜索的数据包含或者包含于基于DPI规则库的已有DPI规则搜索的数据,则判定所述新生成的DPI规则与DPI规则库的已有DPI规则存在冲突。在所述新生成的DPI规则与DPI规则库的已有DPI规则存在冲突时,分析所述新生成的DPI规则以及对应冲突的已有DPI规则,找出发生冲突的原因,基于该原因修改新生成的DPI规则,在修改后的新生成的DPI规则与DPI规则库的已有DPI规则不存在冲突时,存储修改后的新生成的DPI规则至DPI规则库。其中,在基于新生成的DPI规则搜索的数据包含或者包含于基于DPI规则库的已有DPI规则搜索的数据时,设置新生成的DPI规则与DPI规则库的已有DPI规则的优先级,并基于设置了优先级后的新生成的DPI规则与DPI规则库的已有DPI规则更新DPI规则库;如果修改后的新生成DPI规则与DPI规则库的已有DPI规则仍存在冲突,继续修改或者放弃该DPI规则。Data searched based on newly generated DPI rules and existing DPI rules based on DPI rule base If the searched data is the same or the data searched based on the newly generated DPI rule contains or is included in the data of the existing DPI rule search based on the DPI rule base, it is determined that the newly generated DPI rule and the existing DPI rule of the DPI rule base exist. conflict. When the newly generated DPI rule conflicts with the existing DPI rule of the DPI rule base, analyzing the newly generated DPI rule and the existing DPI rule corresponding to the conflict, and finding out the cause of the conflict, and modifying the new cause based on the reason The generated DPI rule stores the modified newly generated DPI rule to the DPI rule base when there is no conflict between the modified newly generated DPI rule and the existing DPI rule of the DPI rule base. Wherein, when the data searched based on the newly generated DPI rule includes or is included in the data of the existing DPI rule search based on the DPI rule base, the priority of the newly generated DPI rule and the existing DPI rule of the DPI rule base is set, and The DPI rule base is updated based on the newly generated DPI rules after the priority is set and the existing DPI rules of the DPI rule base; if the modified newly generated DPI rules still conflict with the existing DPI rules of the DPI rule base, the modification continues or Discard the DPI rules.

在基于编译新生成的DPI规则更新DPI规则库之后,即可采用更新后DPI规则库的DPI规则识别移动互联网中新出现的业务数据。After updating the DPI rule base based on compiling the newly generated DPI rule, the DPI rule of the updated DPI rule base can be used to identify the newly appearing business data in the mobile internet.

本实施例DPI规则的生成方法,首先通过获取互联网数据,基于DPI规则库的已有DPI规则识别所述互联网数据;接着在所述互联网数据存在未识别数据时,分析所述未识别数据,以获取所述未识别数据的第一特征;然后基于所述第一特征编译生成新的DPI规则;最后将所述新的DPI规则存储至所述DPI规则库。根据获取到的互联网数据获取未识别数据、分析未识别数据的业务特征,基于业务特征编译生成新的DPI规则,并基于新的DPI规则更新DPI规则库,完成了DPI规则库的实时自动更新,避免出现DPI规则库的DPI规则无法准确识别互联网数据中的业务数据的问题,提高了数据识别的识别率和准确率。The method for generating a DPI rule in this embodiment firstly identifies the Internet data by acquiring an Internet data according to an existing DPI rule of the DPI rule base; and then analyzing the unidentified data when the Internet data has unidentified data, Obtaining a first feature of the unrecognized data; then generating a new DPI rule based on the first feature compilation; and finally storing the new DPI rule to the DPI rule base. Obtain unidentified data based on the obtained Internet data, analyze the business characteristics of the unidentified data, compile and generate new DPI rules based on the business features, and update the DPI rule base based on the new DPI rules, and complete the real-time automatic update of the DPI rule base. Avoiding the DPI rule of the DPI rule base can not accurately identify the business data in the Internet data, and improve the recognition rate and accuracy of data recognition.

参照图3,图3为图1中步骤S20第一实施例的细化流程示意图。Referring to FIG. 3, FIG. 3 is a schematic diagram of a refinement process of the first embodiment of step S20 of FIG.

可选地,基于第一实施例提出本发明DPI规则的生成方法中步骤S20的细化流程的实施例,本实施例中,步骤S20包括:Optionally, an embodiment of the refinement process of step S20 in the method for generating a DPI rule of the present invention is provided based on the first embodiment. In this embodiment, step S20 includes:

步骤S21,在所述互联网数据存在未识别数据时,获取所述未识别数据的第二特征;Step S21: Acquire a second feature of the unidentified data when the Internet data has unidentified data;

基于未识别数据的明文数据采用现有的协议获取未识别数据的第二特 征,第二特征包括域名等未识别数据的明文数据特征,并以该第二特征作为对应未识别数据的业务名称。Obtaining second data of unidentified data using existing protocols based on plaintext data of unidentified data The second feature includes a plaintext data feature of unidentified data such as a domain name, and the second feature is used as a service name corresponding to the unidentified data.

步骤S22,基于所述第二特征过滤所述未识别数据,以获取业务数据;Step S22, filtering the unidentified data based on the second feature to obtain service data.

过滤未识别数据,去除未识别数据的非业务数据,保证剩余的未识别数据为纯业务数据。可以通过获取未识别数据中的用户数据及用户数据对应的目标IP地址,在用户数据及用户数据对应的目标IP地址均能够与第二特征匹配成功时,该用户数据对应的未识别数据为业务数据;在用户数据及用户数据对应的目标IP地址不能同时与第二特征匹配成功时,该用户数据对应的未识别数据为非业务数据。本实施的过滤未识别数据的方式可以将未识别数据中的非业务数据删除或者标示为非业务数据。Filter unidentified data, remove non-business data of unidentified data, and ensure that the remaining unidentified data is pure business data. The unidentified data corresponding to the user data may be the service when the user data and the target IP address corresponding to the user data are successfully matched with the second feature by acquiring the user data corresponding to the user data and the target IP address corresponding to the user data. Data; when the target IP address corresponding to the user data and the user data cannot be successfully matched with the second feature at the same time, the unidentified data corresponding to the user data is non-service data. The manner of filtering unidentified data in the present embodiment may delete or mark non-service data in the unidentified data as non-service data.

步骤S23,分析所述业务数据,以获取所述业务数据的业务特征;Step S23, analyzing the service data to obtain a service feature of the service data;

可以采用固有特征集包含的特征、未识别数据的负载和/或未识别数据中多数数据流的相同序列具有的共同特征等基于数据挖掘算法分析业务数据的业务特征。该业务特征为固有特征集包含的特征、未识别数据的负载和/或未识别数据中多数数据流的相同序列具有的共同特征中的一种或几种;其中负载是指未识别数据包含的对应的服务器等加密数据序列。The business characteristics of the business data may be analyzed based on a data mining algorithm using features included in the intrinsic feature set, loads of unidentified data, and/or common features of the same sequence of most data streams in unidentified data. The service feature is a feature included in the intrinsic feature set, a load of unidentified data, and/or one or more of the common features of the same sequence of most data streams in the unidentified data; wherein the load refers to the unidentified data included Corresponding server and other encrypted data sequences.

步骤S24,将获取的所述业务特征作为所述未识别数据的第一特征。Step S24, the acquired service feature is used as the first feature of the unidentified data.

在DPI规则的生成方法第二实施例中,请参照图4,在步骤S21和步骤S22之间,DPI规则的生成方法还包括:In the second embodiment of the method for generating a DPI rule, referring to FIG. 4, between the step S21 and the step S22, the method for generating the DPI rule further includes:

步骤S25,获取所述第二特征对应的第一目标IP地址和/或第一用户数据;Step S25: Obtain a first target IP address and/or first user data corresponding to the second feature.

其中,第一目标IP地址、第一用户数据分别为未识别数据中第二特征对应的目标IP地址和用户数据。The first target IP address and the first user data are respectively a target IP address and user data corresponding to the second feature in the unidentified data.

步骤S26,在所述第一目标IP地址与所述第二特征对应的第二目标IP地址不一致时,采用所述第二目标IP更新所述未识别数据中的第一目标IP地址;Step S26, when the first target IP address is inconsistent with the second target IP address corresponding to the second feature, using the second target IP to update the first target IP address in the unidentified data;

和/或,在所述第一用户数据与所述第二特征对应的第二用户数据不一致时,采用所述第二用户数据更新所述未识别数据中的所述第一用户数据。And/or, when the first user data is inconsistent with the second user data corresponding to the second feature, the second user data is used to update the first user data in the unidentified data.

其中,第二目标IP地址为互联网数据中第二特征对应的目标IP地址,第二用户数据为互联网数据中第二特征对应的用户数据。 The second target IP address is a target IP address corresponding to the second feature in the Internet data, and the second user data is user data corresponding to the second feature in the Internet data.

通过获取未识别数据中所述第二特征对应的第一目标IP地址和/或第一用户数据,并在所述第一目标IP地址与所述第二特征对应的第二目标IP地址不一致时,采用所述第二目标IP更新所述未识别数据中的第一目标IP地址,和/或,在所述第一用户数据与所述第二特征对应的第二用户数据不一致时,采用所述第二用户数据更新所述未识别数据中的所述第一用户数据。补全了未识别数据,确保了第二特征对应的未识别数据的完整性,进而提高了后续生成的DPI规则的准确率。例如,对于业务X,已获取的第二特征对应的第一目标IP地址为A,业务X业务规则发生变化时,业务X的应用版本升级,所获取的互联网数据,可能包括第二特征对应的第二目标IP地址为B的数据;其中,地址B为升级版本所产生的数据中的新的目标IP地址,地址A为升级前旧版本所产生的数据中的目标IP地址。这时,地址A和B不一致,采用第二目标IP地址B更新未识别数据中的第一目标IP地址A;包括,将第二目标IP地址B添加至第一目标IP地址,更新后的第二特征对应的第一目标IP地址为A和B;或者,将第一目标IP地址更新为B。例如,对于业务Y,已获取的第二特征对应的第一用户数据为C,业务Y业务规则发生变化时,业务Y的应用版本升级,所获取的互联网数据,可能包括第二特征对应的第二用户数据为D的数据;其中,用户数据D为升级版本所产生的数据中的新的用户数据,用户数据C为升级前旧版本所产生的数据中的用户数据。这时,当D和C不一致时,采用第二用户数据D更新未识别数据中的第一用户数据C;包括,将第二用户数据D添加至第一用户数据,更新后的第一用户数据为C和D;或者,将第一用户数据更新为D。Obtaining, by the first target IP address and/or the first user data corresponding to the second feature in the unidentified data, and when the first target IP address is inconsistent with the second target IP address corresponding to the second feature Updating the first target IP address in the unidentified data by using the second target IP, and/or, when the first user data is inconsistent with the second user data corresponding to the second feature, The second user data updates the first user data in the unidentified data. Complementing the unidentified data ensures the integrity of the unidentified data corresponding to the second feature, thereby improving the accuracy of the subsequently generated DPI rules. For example, for the service X, the first target IP address corresponding to the acquired second feature is A, and when the service X service rule changes, the application version of the service X is upgraded, and the acquired Internet data may include the second feature corresponding to the second feature. The second target IP address is data of B; wherein address B is a new target IP address in the data generated by the upgraded version, and address A is the target IP address in the data generated by the old version before the upgrade. At this time, the addresses A and B are inconsistent, and the second target IP address B is used to update the first target IP address A in the unidentified data; and the second target IP address B is added to the first target IP address, and the updated The first target IP address corresponding to the second feature is A and B; or, the first target IP address is updated to B. For example, for the service Y, the first user data corresponding to the acquired second feature is C, and when the service Y service rule changes, the application version of the service Y is upgraded, and the acquired Internet data may include the second feature corresponding to the second feature. The user data is the data of D; wherein the user data D is new user data in the data generated by the upgraded version, and the user data C is the user data in the data generated by the old version before the upgrade. At this time, when D and C are inconsistent, the first user data C in the unidentified data is updated by using the second user data D; including: adding the second user data D to the first user data, and the updated first user data For C and D; or, update the first user data to D.

本实施例中,通过第二特征过滤所述未识别数据,获取未识别数据中的业务数据,并通过分析所述业务数据获取所述业务数据的业务特征,并将获取的所述业务特征作为所述未识别数据的第一特征,提高了第一特征的准确性,进而提高了后续生成的DPI规则的准确率。In this embodiment, the unidentified data is filtered by the second feature, the service data in the unidentified data is obtained, and the service feature of the service data is obtained by analyzing the service data, and the obtained service feature is used as The first feature of the unidentified data improves the accuracy of the first feature, thereby improving the accuracy of the subsequently generated DPI rule.

参照图5,图5为图3中步骤S23的细化流程示意图。Referring to FIG. 5, FIG. 5 is a schematic diagram of the refinement process of step S23 in FIG.

基于上一实施例提出本发明DPI规则的生成方法中步骤S23的细化流程的实施例,本实施例中,步骤S23包括:An embodiment of the refinement process of step S23 in the method for generating a DPI rule according to the present invention is provided based on the previous embodiment. In this embodiment, step S23 includes:

步骤S231,将所述业务数据分为多组业务数据组; Step S231, dividing the service data into multiple groups of service data groups;

以用户流为单位将所述业务数据分为多组业务数据组,分组时可以以N个用户流为一组对业务数据进行分组。用户流是指用户访问某一服务器时,与该服务器IP连接过程中的产生的数据流。The service data is divided into multiple groups of service data groups in units of user flows, and the service data may be grouped by a group of N user flows. User flow refers to the data flow generated during the process of connecting a server to an IP address when the user accesses a server.

步骤S232,对所述业务数据组的相同序列的载荷报文进行数据挖掘,以获取多个所述业务数据组的业务特征。Step S232: Perform data mining on the same sequence of load packets of the service data group to obtain service features of the plurality of service data groups.

采用数据挖掘算法对每一个业务数据组的相同序列的载荷报文进行数据挖掘,以获取每一个所述业务数据组的业务特征,该业务特征是指能够覆盖预设比例以上的业务数据组的业务数据的共同特征,即业务数据组预设比例以上的业务数据的共同特征,或者业务数据组预设比例以上的业务数据都包含该业务特征,其中预设比例是为保证后续生成的DPI规则的准确率而预先设置的比例,该预设比例可以根据需求设置为90%、95%等比例。The data mining algorithm is used to perform data mining on the same sequence of load packets of each service data group, so as to obtain the service features of each of the service data groups, where the service feature refers to a service data group capable of covering a preset ratio or more. The common feature of the service data, that is, the common feature of the service data above the preset ratio of the service data group, or the service data of the service data group preset ratio or more includes the service feature, wherein the preset ratio is to ensure the subsequent generation of the DPI rule. The accuracy ratio and the preset ratio, the preset ratio can be set to 90%, 95%, etc. according to the demand.

步骤S233,将获取的多个所述业务特征作为所述未识别数据的第一特征。Step S233, the obtained plurality of the service features are used as the first feature of the unidentified data.

本实施例中,通过将所述业务数据分为多组业务数据组;并对所述业务数据组的相同序列的载荷报文进行数据挖掘,以获取所述业务数据组的业务特征,并将获取的所述业务特征作为所述未识别数据的第一特征;分组且采用数据挖掘的方式获取业务数据的业务特征,提高了业务特征的准确性。In this embodiment, the service data is divided into multiple groups of service data groups; data mining of the same sequence of load packets of the service data group is performed, to obtain the service characteristics of the service data group, and The obtained service feature is used as the first feature of the unidentified data; the service feature of the service data is obtained by using the data mining method, and the accuracy of the service feature is improved.

一种DPI规则的生成装置。A device for generating DPI rules.

参照图6,图6为DPI规则的生成装置第一实施例的功能模块示意图。Referring to FIG. 6, FIG. 6 is a schematic diagram of functional modules of a first embodiment of a DPI rule generating apparatus.

在本实施例中,该DPI规则的生成装置包括:In this embodiment, the generating device of the DPI rule includes:

识别模块10,设置为获取互联网数据,基于DPI规则库的DPI规则识别所述互联网数据;The identification module 10 is configured to acquire Internet data, and identify the Internet data based on a DPI rule of the DPI rule base;

识别模块10采集/获取移动互联网数据,设置为DPI规则库的DPI规则识别采集/获取到的移动互联网数据。The identification module 10 collects/acquires mobile internet data, and the DPI rule set as the DPI rule base identifies the collected/acquired mobile internet data.

分析模块20,设置为在所述互联网数据存在未识别数据时,分析所述未识别数据,以获取所述未识别数据的第一特征,其中,所述未识别数据为所述互联网数据中所述DPI规则无法识别的互联网数据;The analyzing module 20 is configured to analyze the unidentified data to obtain the first feature of the unidentified data when the Internet data has unidentified data, wherein the unidentified data is in the Internet data Internet data that is not recognized by DPI rules;

在所述互联网数据存在未识别数据时,分析模块20,设置为对未识别数据进行分析,用以获取未识别数据的第一特征。分析模块20是设置为采用固有特征集包含的特征、未识别数据的负载和/或未识别数据中多数数据流的相 同序列具有的共同特征等基于数据挖掘算法获取第一特征。该第一特征为固有特征集包含的特征、未识别数据的负载和/或未识别数据中多数数据流的相同序列具有的共同特征中的一种或几种;其中负载是指未识别数据包含的对应的服务器等加密数据序列。When the Internet data has unidentified data, the analyzing module 20 is configured to analyze the unidentified data to obtain the first feature of the unidentified data. The analysis module 20 is configured to employ features included in the intrinsic feature set, load of unidentified data, and/or phase of most data streams in unidentified data. The first feature is acquired based on a data mining algorithm, such as a common feature possessed by the same sequence. The first feature is a feature included in the intrinsic feature set, a load of unidentified data, and/or one or more of the common features of the same sequence of the majority of the data streams in the unidentified data; wherein the load refers to the unidentified data inclusion Corresponding server and other encrypted data sequences.

编译模块30,设置为基于所述第一特征编译生成DPI规则;The compiling module 30 is configured to generate a DPI rule based on the first feature compilation;

编译模块30,设置为基于获取的第一特征编译生成DPI规则。编译方式可以采用现有DPI规则的编译方式,也可以采用其他编译方式,譬如现有DPI规则的编译方式优化后的编译方式等,本实施例中不做进一步地限定。The compiling module 30 is configured to generate a DPI rule based on the acquired first feature compilation. The compiling method may adopt the compiling method of the existing DPI rules, and may also adopt other compiling methods, such as the compiling mode optimized by the existing DPI rules, which is not further limited in this embodiment.

存储模块40,设置为将所述新生成的DPI规则存储至所述DPI规则库。The storage module 40 is configured to store the newly generated DPI rule to the DPI rule base.

存储模块40将编译生成的新的DPI规则存储至DPI规则库,即基于编译生成的新的DPI规则更新DPI规则库,其更新过程采用热更新。The storage module 40 stores the new DPI rules generated by the compilation into the DPI rule base, that is, updates the DPI rule base based on the new DPI rules generated by the compilation, and the update process adopts a hot update.

在其他实施例中,请参考图7,所述存储模块40包括:In other embodiments, referring to FIG. 7, the storage module 40 includes:

判断单元41,设置为判断所述生成的新的DPI规则与所述DPI规则库的已有DPI规则是否冲突;The determining unit 41 is configured to determine whether the generated new DPI rule conflicts with an existing DPI rule of the DPI rule base;

存储单元42,设置为在新生成的DPI规则与所述DPI规则库的已有DPI规则不存在冲突时,将所述新生成的DPI规则存储至DPI规则库。The storage unit 42 is configured to store the newly generated DPI rule to the DPI rule base when there is no conflict between the newly generated DPI rule and the existing DPI rule of the DPI rule base.

在基于新生成的DPI规则搜索的数据与基于DPI规则库的已有DPI规则搜索的数据相同或基于新生成的DPI规则搜索的数据包含或者包含于基于DPI规则库的已有DPI规则搜索的数据时,判断单元41判定所述新生成的DPI规则与DPI规则库的已有DPI规则存在冲突。在所述新生成的DPI规则与DPI规则库的已有DPI规则存在冲突时,分析所述新生成的DPI规则以及对应冲突的已有DPI规则,找出发生冲突的原因,基于该原因修改新生成的DPI规则,在修改后的新生成的DPI规则与DPI规则库的已有DPI规则不存在冲突时,存储修改后的新生成DPI规则至DPI规则库,其中,在基于新生成的DPI规则搜索的数据包含或者包含于基于DPI规则库的已有DPI规则搜索的数据时,设置新生成的DPI规则与DPI规则库的已有DPI规则的优先级,并基于设置后的新生成的DPI规则与DPI规则库的已有DPI规则更新DPI规则库;如果修改后的新生成的DPI规则与DPI规则库的已有DPI规则仍存在冲突,则继续修改或者放弃该DPI规则。The data searched based on the newly generated DPI rule is the same as the data searched by the existing DPI rule based on the DPI rule base or the data searched based on the newly generated DPI rule contains or is included in the data of the existing DPI rule search based on the DPI rule base. The judging unit 41 determines that the newly generated DPI rule conflicts with the existing DPI rule of the DPI rule base. When the newly generated DPI rule conflicts with the existing DPI rule of the DPI rule base, analyzing the newly generated DPI rule and the existing DPI rule corresponding to the conflict, and finding out the cause of the conflict, and modifying the new cause based on the reason The generated DPI rule stores the modified newly generated DPI rule to the DPI rule base when the modified newly generated DPI rule does not conflict with the existing DPI rule of the DPI rule base, wherein the newly generated DPI rule is based on the newly generated DPI rule When the searched data contains or is included in the data of the existing DPI rule search based on the DPI rule base, the priority of the newly generated DPI rule and the existing DPI rule of the DPI rule base is set, and the newly generated DPI rule is set based on the setting. Update the DPI rule base with the existing DPI rules of the DPI rule base; if the modified newly generated DPI rule still conflicts with the existing DPI rule of the DPI rule base, continue to modify or discard the DPI rule.

本实施例DPI规则的生成装置,首先通过识别模块10获取互联网数据, 基于DPI规则库的已有DPI规则识别所述互联网数据;接着在所述互联网数据存在未识别数据时,分析模块20分析所述未识别数据,以获取所述未识别数据的第一特征;然后编译模块30基于所述第一特征编译生成新的DPI规则;最后存储模块40将所述新生成的DPI规则存储至所述DPI规则库。根据获取到的互联网数据获取未识别数据、分析未识别数据的业务特征,基于业务特征编译生成DPI规则,并基于新生成的DPI规则更新DPI规则库,完成了DPI规则库的实时自动更新,避免出现DPI规则库的DPI规则无法准确识别互联网数据中的业务数据的问题,提高了数据识别的识别率和准确率。The device for generating DPI rules in this embodiment first acquires Internet data through the identification module 10, The existing DPI rules based on the DPI rule base identify the Internet data; then, when the Internet data has unidentified data, the analysis module 20 analyzes the unidentified data to obtain the first feature of the unidentified data; The compiling module 30 compiles a new DPI rule based on the first feature compilation; the last storage module 40 stores the newly generated DPI rule to the DPI rule base. According to the obtained Internet data, the unidentified data is obtained, the unidentified data is analyzed, the DPI rule is compiled based on the service feature, and the DPI rule base is updated based on the newly generated DPI rule, and the real-time automatic update of the DPI rule base is completed, thereby avoiding The DPI rule of the DPI rule base cannot accurately identify the problem of business data in the Internet data, and improves the recognition rate and accuracy of data recognition.

参照图8,图8为图6中分析模块第一实施例的细化功能模块示意图。Referring to FIG. 8, FIG. 8 is a schematic diagram of a refinement function module of the first embodiment of the analysis module of FIG. 6.

基于上述DPI规则的生成装置第一实施例中分析模块的细化功能模块的实施例,本实施例中,所述分析模块20包括:An embodiment of the refinement function module of the analysis module in the first embodiment of the device for generating the above-mentioned DPI rules. In this embodiment, the analysis module 20 includes:

第一获取单元21,设置为在所述互联网数据存在未识别数据时,获取所述未识别数据的第二特征;The first obtaining unit 21 is configured to acquire a second feature of the unidentified data when the Internet data has unidentified data;

其中,第一获取单元21是设置为基于未识别数据的明文数据采用现有的协议获取未识别数据的第二特征,第二特征包括域名等未识别数据的明文数据特征,并以该第二特征作为对应未识别数据的业务名称。The first acquiring unit 21 is configured to acquire the second feature of the unidentified data by using an existing protocol based on the plaintext data of the unidentified data, and the second feature includes the plaintext data feature of the unidentified data such as the domain name, and the second feature The feature is the business name corresponding to the unidentified data.

过滤单元22,设置为基于所述第二特征过滤所述未识别数据,以获取业务数据;The filtering unit 22 is configured to filter the unidentified data based on the second feature to obtain service data;

其中,过滤单元22是设置为过滤未识别数据,去除未识别数据的非业务数据,保证剩余的未识别数据为纯业务数据。可以通过获取未识别数据中的用户数据及用户数据对应的目标IP地址,在用户数据及用户数据对应的目标IP地址均能够与第二特征匹配成功时,该用户数据对应的未识别数据为业务数据;在用户数据及用户数据对应的目标IP地址不能同时与第二特征匹配成功时,该用户数据对应的未识别数据为非业务数据。本实施例中的过滤未识别数据的方式可以将未识别数据中的非业务数据删除或者标示为非业务数据。The filtering unit 22 is configured to filter unidentified data, remove non-service data of unidentified data, and ensure that the remaining unidentified data is pure service data. The unidentified data corresponding to the user data may be the service when the user data and the target IP address corresponding to the user data are successfully matched with the second feature by acquiring the user data corresponding to the user data and the target IP address corresponding to the user data. Data; when the target IP address corresponding to the user data and the user data cannot be successfully matched with the second feature at the same time, the unidentified data corresponding to the user data is non-service data. The manner of filtering unidentified data in this embodiment may delete or mark non-service data in the unidentified data as non-service data.

分析单元23,设置为分析所述业务数据,以获取所述业务数据的业务特征,并将获取的所述业务特征作为所述未识别数据的第一特征。The analyzing unit 23 is configured to analyze the service data to obtain a service feature of the service data, and use the acquired service feature as the first feature of the unidentified data.

其中,分析单元23是设置为采用固有特征集包含的特征、未识别数据的 负载和/或未识别数据中多数数据流的相同序列具有的共同特征等基于数据挖掘算法分析业务数据的业务特征。该业务特征为固有特征集包含的特征、未识别数据的负载和/或未识别数据中多数数据流的相同序列具有的共同特征中的一种或几种;其中负载是指未识别数据包含的对应的服务器等加密数据序列。Wherein, the analyzing unit 23 is configured to adopt the features included in the intrinsic feature set, and the unidentified data. The common features of the same sequence of most data streams in the payload and/or unidentified data, etc. are based on data mining algorithms to analyze the business characteristics of the business data. The service feature is a feature included in the intrinsic feature set, a load of unidentified data, and/or one or more of the common features of the same sequence of most data streams in the unidentified data; wherein the load refers to the unidentified data included Corresponding server and other encrypted data sequences.

参照图9,图9为图6中分析模块第二实施例的细化功能模块示意图。Referring to FIG. 9, FIG. 9 is a schematic diagram of a refinement function module of the second embodiment of the analysis module of FIG. 6.

在该实施例中,所述分析模块20还包括:In this embodiment, the analysis module 20 further includes:

第二获取单元24,设置为获取所述第二特征对应的第一目标IP地址和/或第一用户数据;The second obtaining unit 24 is configured to acquire a first target IP address and/or first user data corresponding to the second feature;

其中,第一目标IP地址、第一用户数据分别为未识别数据中第二特征对应的目标IP地址和用户数据。The first target IP address and the first user data are respectively a target IP address and user data corresponding to the second feature in the unidentified data.

更新单元25,设置为在所述第一目标IP地址与所述第二特征对应的第二目标IP地址不一致时,采用所述第二目标IP更新所述未识别数据中的第一目标IP地址;和/或,在所述第一用户数据与所述第二特征对应的第二用户数据不一致时,采用所述第二用户数据更新所述未识别数据中的所述第一用户数据。The updating unit 25 is configured to update, by using the second target IP, the first target IP address in the unidentified data when the first target IP address is inconsistent with the second target IP address corresponding to the second feature And/or, when the first user data is inconsistent with the second user data corresponding to the second feature, the first user data in the unidentified data is updated by using the second user data.

其中,第二目标IP地址为互联网数据中第二特征对应的目标IP地址,第二用户数据为互联网数据中第二特征对应的用户数据。The second target IP address is a target IP address corresponding to the second feature in the Internet data, and the second user data is user data corresponding to the second feature in the Internet data.

通过第二获取单元24获取未识别数据中所述第二特征对应的第一目标IP地址和/或第一用户数据,在所述第一目标IP地址与所述第二特征对应的第二目标IP地址不一致时,更新单元25设置为采用所述第二目标IP更新所述未识别数据中的第一目标IP地址,和/或,在所述第一用户数据与所述第二特征对应的第二用户数据不一致时,更新单元25设置为采用所述第二用户数据更新所述未识别数据中的所述第一用户数据。补全了未识别数据,确保了第二特征对应的未识别数据的完整性,进而提高了后续生成的DPI规则的准确率。Obtaining, by the second obtaining unit 24, a first target IP address and/or first user data corresponding to the second feature in the unidentified data, and the second target corresponding to the second feature in the first target IP address When the IP addresses are inconsistent, the updating unit 25 is configured to update the first target IP address in the unidentified data by using the second target IP, and/or, in the first user data, corresponding to the second feature. When the second user data is inconsistent, the updating unit 25 is configured to update the first user data in the unidentified data with the second user data. Complementing the unidentified data ensures the integrity of the unidentified data corresponding to the second feature, thereby improving the accuracy of the subsequently generated DPI rules.

本实施例中,通过过滤单元22是设置为基于第二特征过滤所述未识别数据获取未识别数据中的业务数据,并通过分析所述业务数据获取所述业务数据的业务特征,并将获取的所述业务特征作为所述未识别数据的第一特征, 提高了第一特征的准确性,进而提高了后续生成的DPI规则的准确率。In this embodiment, the filtering unit 22 is configured to filter the unidentified data based on the second feature to obtain service data in the unidentified data, and obtain the service feature of the service data by analyzing the service data, and obtain the service feature. The service feature as a first feature of the unidentified data, The accuracy of the first feature is improved, thereby improving the accuracy of the subsequently generated DPI rules.

参照图10,图10为图8中分析单元的细化功能模块示意图。Referring to FIG. 10, FIG. 10 is a schematic diagram of a refinement function module of the analysis unit of FIG.

基于上一实施例提出本发明DPI规则的生成装置中分析单元的细化流功能模块程的实施例,本实施例中,所述分析单元23包括:An embodiment of the refinement flow function module of the analysis unit in the DPI rule generation device of the present invention is proposed based on the previous embodiment. In this embodiment, the analysis unit 23 includes:

分组子单元231,设置为将所述业务数据分为多组业务数据组;The grouping subunit 231 is configured to divide the service data into multiple groups of service data groups;

分组子单元231,是设置为以用户流为单位将所述业务数据分为至少两组业务数据组,分组时也可以以N个用户流为一组对业务数据进行分组。用户流是指用户成功访问某一服务器IP时,访问过程中的全部数据。The grouping subunit 231 is configured to divide the service data into at least two sets of service data groups in units of user streams, and may group the service data in groups of N user streams. User flow refers to all data in the access process when a user successfully accesses a server IP.

数据挖掘子单元232,设置为对所述业务数据组的相同序列的载荷报文进行数据挖掘,以获取多个所述业务数据组的业务特征,并将获取的多个所述业务特征作为所述未识别数据的第一特征。The data mining sub-unit 232 is configured to perform data mining on the same sequence of load packets of the service data group, to obtain service features of the plurality of service data groups, and obtain the plurality of the service features as the The first feature of the unrecognized data is described.

数据挖掘子单元232是设置为采用数据挖掘算法对每一个业务数据组的相同序列的载荷报文进行数据挖掘,以获取所述业务数据组的业务特征,该业务特征是指能够覆盖预设比例以上的业务数据组的业务数据的共同特征,即业务数据组预设比例以上的业务数据的共同特征,或者业务数据组预设比例以上的业务数据都包含该业务特征,其中预设比例是为保证后续生成的DPI规则的准确率而预先设置的比例,该预设比例可以根据需求设置为90%、95%等。The data mining sub-unit 232 is configured to perform data mining on the same sequence of load packets of each service data group by using a data mining algorithm to obtain a service feature of the service data group, where the service feature is capable of covering a preset ratio. The common feature of the service data of the foregoing service data group, that is, the common feature of the service data above the preset ratio of the service data group, or the service data of the service data group preset ratio or more includes the service feature, wherein the preset ratio is The ratio of the accuracy of the subsequently generated DPI rule is set in advance, and the preset ratio can be set to 90%, 95%, etc. according to requirements.

本实施例中,分组子单元221设置为将所述业务数据分为多组业务数据组;数据挖掘子单元222对所述业务数据组的相同序列的载荷报文进行数据挖掘,以获取多个所述业务数据组的业务特征;分组且采用数据挖掘的方式获取业务数据的业务特征,提高了业务特征的准确性。In this embodiment, the grouping subunit 221 is configured to divide the service data into multiple groups of service data groups; the data mining subunit 222 performs data mining on the same sequence of load packets of the service data group to obtain multiple The service characteristics of the service data group; grouping and adopting data mining to obtain service characteristics of the service data, and improving the accuracy of the service feature.

本发明实施例还提供一种计算机可读存储介质,存储有程序指令,当该程序指令被处理器执行时可实现本发明实施例所提供的一种DPI规则的生成方法。 The embodiment of the present invention further provides a computer readable storage medium, which stores program instructions, and when the program instructions are executed by the processor, can implement a DPI rule generation method provided by an embodiment of the present invention.

工业实用性Industrial applicability

本发明实施例提供的方法,根据获取到的互联网数据获取未识别数据、分析未识别数据的业务特征,基于业务特征编译生成新的DPI规则,并基于新生成的DPI规则更新DPI规则库,完成了DPI规则库的实时自动更新,避免出现DPI规则库的DPI规则无法准确识别互联网数据中的业务数据的问题,提高了数据识别的识别率和准确率。 The method provided by the embodiment of the present invention acquires unidentified data according to the acquired Internet data, analyzes the service characteristics of the unidentified data, compiles and generates a new DPI rule based on the service feature, and updates the DPI rule base based on the newly generated DPI rule, and completes The real-time automatic update of the DPI rule base avoids the problem that the DPI rule of the DPI rule base cannot accurately identify the business data in the Internet data, and improves the recognition rate and accuracy of the data recognition.

Claims (11)

一种DPI深度业务识别规则的生成方法,包括以下步骤:A method for generating a DPI deep service identification rule includes the following steps: 获取互联网数据,基于DPI规则库的已有DPI规则识别所述互联网数据;Obtaining internet data, and identifying the internet data based on existing DPI rules of the DPI rule base; 在所述互联网数据存在未识别数据时,分析所述未识别数据,以获取所述未识别数据的第一特征,其中,所述未识别数据为所述互联网数据中所述DPI规则无法识别的互联网数据;When the Internet data has unidentified data, analyzing the unidentified data to obtain a first feature of the unidentified data, wherein the unidentified data is unrecognizable by the DPI rule in the Internet data. Internet data; 基于所述第一特征编译生成新的DPI规则;Generating a new DPI rule based on the first feature compilation; 将所述新的DPI规则存储至所述DPI规则库。The new DPI rules are stored to the DPI rule base. 如权利要求1所述的DPI规则的生成方法,其中,所述在所述互联网数据存在未识别数据时,分析所述未识别数据,以获取所述未识别数据的第一特征的步骤包括:The method for generating a DPI rule according to claim 1, wherein the step of analyzing the unidentified data to obtain the first feature of the unidentified data when the Internet data has unidentified data comprises: 在所述互联网数据存在未识别数据时,获取所述未识别数据的第二特征;Acquiring the second feature of the unidentified data when the Internet data has unidentified data; 基于所述第二特征过滤所述未识别数据,以获取业务数据;Filtering the unidentified data based on the second feature to obtain service data; 分析所述业务数据,以获取所述业务数据的业务特征;Analyzing the service data to obtain a service feature of the service data; 将获取的所述业务特征作为所述未识别数据的第一特征。The acquired service feature is taken as the first feature of the unidentified data. 如权利要求2所述的DPI规则的生成方法,其中,在所述互联网数据存在未识别数据时,获取所述未识别数据的第二特征的步骤与所述基于所述第二特征过滤所述未识别数据,以获取业务数据的步骤之间,所述DPI规则的生成方法还包括:The method of generating a DPI rule according to claim 2, wherein, when the Internet data has unidentified data, the step of acquiring the second feature of the unidentified data and the filtering based on the second feature The method for generating the DPI rule between the steps of not identifying the data to obtain the business data further includes: 获取所述第二特征对应的第一目标IP地址和/或第一用户数据;Obtaining a first target IP address and/or first user data corresponding to the second feature; 在所述第一目标IP地址与所述第二特征对应的第二目标IP地址不一致时,采用所述第二目标IP更新所述未识别数据中的第一目标IP地址;And updating, by the second target IP, the first target IP address in the unidentified data when the first target IP address is inconsistent with the second target IP address corresponding to the second feature; 和/或,在所述第一用户数据与所述第二特征对应的第二用户数据不一致时,采用所述第二用户数据更新所述未识别数据中的所述第一用户数据。And/or, when the first user data is inconsistent with the second user data corresponding to the second feature, the second user data is used to update the first user data in the unidentified data. 如权利要求2所述的DPI规则的生成方法,其中,所述分析所述业务数据,以获取所述业务数据的业务特征的步骤包括: The method for generating a DPI rule according to claim 2, wherein the step of analyzing the service data to obtain a service feature of the service data comprises: 将所述业务数据分为多组业务数据组;Dividing the service data into multiple groups of service data groups; 对每一个所述业务数据组的相同序列的载荷报文进行数据挖掘,以获取每一个所述业务数据组的业务特征;Performing data mining on the same sequence of load packets of each of the service data groups to obtain service characteristics of each of the service data groups; 将获取的多个所述业务特征作为所述未识别数据的第一特征。The plurality of the obtained service features are taken as the first feature of the unidentified data. 如权利要求1至4任一项所述的DPI规则的生成方法,其中,所述将所述新的DPI规则存储至DPI规则库的步骤包括:The method for generating a DPI rule according to any one of claims 1 to 4, wherein the step of storing the new DPI rule to a DPI rule base comprises: 判断生成的所述新的DPI规则与所述DPI规则库的已有DPI规则是否冲突;Determining whether the generated new DPI rule conflicts with an existing DPI rule of the DPI rule base; 在生成的所述新的DPI规则与DPI规则库的已有DPI规则不存在冲突时,将所述新的DPI规则存储至DPI规则库。When the generated new DPI rule does not conflict with the existing DPI rule of the DPI rule base, the new DPI rule is stored to the DPI rule base. 一种DPI规则的生成装置,包括:A device for generating DPI rules, comprising: 识别模块,设置为获取互联网数据,基于DPI规则库的已有DPI规则识别所述互联网数据;An identification module configured to obtain Internet data, and the Internet data is identified based on an existing DPI rule of the DPI rule base; 分析模块,设置为在所述互联网数据存在未识别数据时,分析所述未识别数据,以获取所述未识别数据的第一特征,其中,所述未识别数据为所述互联网数据中所述DPI规则无法识别的互联网数据;An analysis module, configured to analyze the unidentified data to obtain a first feature of the unidentified data when the Internet data has unidentified data, wherein the unidentified data is the Internet data Internet data that is not recognized by DPI rules; 编译模块,设置为基于所述第一特征编译生成新的DPI规则;Compiling a module, configured to generate a new DPI rule based on the first feature compilation; 存储模块,设置为将所述新的DPI规则存储至所述DPI规则库。a storage module configured to store the new DPI rule to the DPI rule base. 如权利要求6所述的DPI规则的生成装置,其中,所述分析模块包括:The apparatus for generating DPI rules according to claim 6, wherein the analysis module comprises: 第一获取单元,设置为在所述互联网数据存在未识别数据时,获取所述未识别数据的第二特征;a first acquiring unit, configured to acquire a second feature of the unidentified data when the Internet data has unidentified data; 过滤单元,设置为基于所述第二特征过滤所述未识别数据,以获取业务数据;a filtering unit, configured to filter the unidentified data based on the second feature to obtain service data; 分析单元,设置为分析所述业务数据,以获取所述业务数据的业务特征,并将获取的所述业务特征作为所述未识别数据的第一特征。The analyzing unit is configured to analyze the service data to obtain a service feature of the service data, and use the acquired service feature as the first feature of the unidentified data. 如权利要求7所述的DPI规则的生成装置,其中,所述分析模块还包 括:The apparatus for generating DPI rules according to claim 7, wherein said analysis module further includes include: 第二获取单元,设置为获取所述第二特征对应的第一目标IP地址和/或第一用户数据;a second acquiring unit, configured to acquire a first target IP address and/or first user data corresponding to the second feature; 更新单元,设置为在所述第一目标IP地址与所述第二特征对应的第二目标IP地址不一致时,采用所述第二目标IP更新所述未识别数据中的第一目标IP地址;和/或,设置为在所述第一用户数据与所述第二特征对应的第二用户数据不一致时,采用所述第二用户数据更新所述未识别数据中的所述第一用户数据。And an update unit, configured to: when the first target IP address is inconsistent with the second target IP address corresponding to the second feature, use the second target IP to update the first target IP address in the unidentified data; And/or, configured to update the first user data in the unidentified data by using the second user data when the first user data is inconsistent with the second user data corresponding to the second feature. 如权利要求7所述的DPI规则的生成装置,其中,所述分析单元包括:The apparatus for generating DPI rules according to claim 7, wherein the analyzing unit comprises: 分组子单元,设置为将所述业务数据分为多组业务数据组;a grouping subunit, configured to divide the service data into multiple groups of service data groups; 数据挖掘子单元,设置为对每一个所述业务数据组的相同序列的载荷报文进行数据挖掘,以获取多个所述业务数据组的业务特征,并将获取的所述业务特征作为所述未识别数据的第一特征。a data mining sub-unit, configured to perform data mining on the same sequence of load packets of each of the service data groups, to obtain service features of the plurality of service data groups, and use the obtained service features as the The first feature of the data is not identified. 如权利要求6-9任一项所述的DPI规则的生成装置,其中,所述存储模块包括:The apparatus for generating a DPI rule according to any one of claims 6-9, wherein the storage module comprises: 判断单元,设置为判断生成的所述新的DPI规则与所述DPI规则库中的已有DPI规则是否冲突;a determining unit, configured to determine whether the generated new DPI rule conflicts with an existing DPI rule in the DPI rule base; 存储单元,设置为在生成的所述新的DPI规则与所述DPI规则库的已有DPI规则不存在冲突时,将所述新的DPI规则存储至DPI规则库。The storage unit is configured to store the new DPI rule to the DPI rule base when the generated new DPI rule does not conflict with the existing DPI rule of the DPI rule base. 一种计算机可读存储介质,存储有程序指令,当该程序指令被处理器执行时实现权利要求1至5任一项所述的方法。 A computer readable storage medium storing program instructions that, when executed by a processor, implement the method of any one of claims 1 to 5.
PCT/CN2016/072175 2015-05-18 2016-01-26 Method and device for generating a dpi rules Ceased WO2016184163A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510254257.4 2015-05-18
CN201510254257.4A CN106301825B (en) 2015-05-18 2015-05-18 DPI rule generation method and device

Publications (1)

Publication Number Publication Date
WO2016184163A1 true WO2016184163A1 (en) 2016-11-24

Family

ID=57319341

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/072175 Ceased WO2016184163A1 (en) 2015-05-18 2016-01-26 Method and device for generating a dpi rules

Country Status (2)

Country Link
CN (1) CN106301825B (en)
WO (1) WO2016184163A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115766603A (en) * 2022-11-16 2023-03-07 上海安博通信息科技有限公司 A processing method, device and processing equipment for applying a traffic feature recognition strategy

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106953792A (en) * 2017-02-15 2017-07-14 北京浩瀚深度信息技术股份有限公司 The instant messaging business recognition method and server added up based on weak feature
CN109639593B (en) * 2018-12-24 2022-08-12 南京中孚信息技术有限公司 Upgrading method and device of deep packet analysis system
CN110708215B (en) * 2019-10-10 2024-06-14 深圳市网心科技有限公司 Deep packet inspection rule base generation method, device, network equipment and storage medium
CN110990669A (en) * 2019-10-16 2020-04-10 广州丰石科技有限公司 DPI (deep packet inspection) analysis method and system based on rule generation
CN113010500B (en) * 2019-12-18 2024-06-14 天翼云科技有限公司 Processing method and processing system for DPI data
CN113067743B (en) * 2020-01-02 2022-12-13 中国移动通信有限公司研究院 Flow rule extraction method, device, system and storage medium
CN114338601A (en) * 2020-09-30 2022-04-12 中兴通讯股份有限公司 Unknown domain name identification method, computer equipment and storage medium
CN114598659B (en) * 2020-11-19 2024-07-05 华为技术有限公司 Rule base optimization method and device
CN113055388B (en) * 2021-03-16 2022-06-03 烽火通信科技股份有限公司 Deep packet detection method and system based on generation countermeasure network
CN114826956B (en) * 2022-03-30 2023-05-26 杭州迪普科技股份有限公司 Automatic DPI policy library file generation method and device for DPI test equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006063052A1 (en) * 2004-12-07 2006-06-15 Nortel Networks Limited Method and apparatus for network immunization
US20090252148A1 (en) * 2008-04-03 2009-10-08 Alcatel Lucent Use of DPI to extract and forward application characteristics
CN102045363A (en) * 2010-12-31 2011-05-04 成都市华为赛门铁克科技有限公司 Establishment, identification control method and device for network flow characteristic identification rule
WO2012170590A1 (en) * 2011-06-09 2012-12-13 Gfk Holding, Inc., Legal Services And Transactions Method for generating rules and parameters for assessing relevance of information derived from internet traffic

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113571A (en) * 2013-04-18 2014-10-22 北京恒华伟业科技股份有限公司 Data collision processing method and device
CN103516727A (en) * 2013-09-30 2014-01-15 重庆电子工程职业学院 Network active defense system and updating method thereof
CN104486143B (en) * 2014-12-01 2018-07-06 中国联合网络通信集团有限公司 A kind of deep message detection method, detecting system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006063052A1 (en) * 2004-12-07 2006-06-15 Nortel Networks Limited Method and apparatus for network immunization
US20090252148A1 (en) * 2008-04-03 2009-10-08 Alcatel Lucent Use of DPI to extract and forward application characteristics
CN102045363A (en) * 2010-12-31 2011-05-04 成都市华为赛门铁克科技有限公司 Establishment, identification control method and device for network flow characteristic identification rule
WO2012170590A1 (en) * 2011-06-09 2012-12-13 Gfk Holding, Inc., Legal Services And Transactions Method for generating rules and parameters for assessing relevance of information derived from internet traffic

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115766603A (en) * 2022-11-16 2023-03-07 上海安博通信息科技有限公司 A processing method, device and processing equipment for applying a traffic feature recognition strategy

Also Published As

Publication number Publication date
CN106301825A (en) 2017-01-04
CN106301825B (en) 2020-10-16

Similar Documents

Publication Publication Date Title
WO2016184163A1 (en) Method and device for generating a dpi rules
JP6410547B2 (en) Malware classification by order of network behavior artifacts
CN105808284B (en) A kind of method for updating increment and the server using it
CN104657396B (en) Data migration method and device
US20100118267A1 (en) Finding sequential matches in eye tracking data
JP2021500658A (en) Computer implementation methods, systems, and computer program products that perform interactive workflows, as well as computer programs.
JP6103325B2 (en) Method, apparatus and system for acquiring user behavior
WO2015165296A1 (en) Method and device for identifying protocol type
WO2017000761A1 (en) Method and apparatus for extracting feature information of terminal device
CN108959929B (en) Program file processing method and device
CN105704177A (en) UA identification method and device
CN106528894A (en) Method and device for setting label information
JP7024255B2 (en) Information processing equipment and programs
WO2016180193A1 (en) Method and apparatus for identifying application installation package
CN109905292A (en) A terminal device identification method, system and storage medium
CN113055420B (en) HTTPS service identification method, device and computing equipment
CN106878311B (en) HTTP message rewriting method and device
CN112887451A (en) Domain name resolution method and device and computer equipment
CN104077422B (en) Download the De-weight method and device of APK
US10445080B2 (en) Methods for adaptive placement of applications and devices thereof
CN105634863A (en) Application protocol detection method and device
CN107517237B (en) A video recognition method and device
WO2018149399A1 (en) Application download counting method, readable storage medium, terminal apparatus and device
JP6064881B2 (en) Setting support program, setting support apparatus, and setting support method
CN106815247B (en) Uniform resource locator obtaining method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16795650

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16795650

Country of ref document: EP

Kind code of ref document: A1