[go: up one dir, main page]

CN111222137A - Program classification model training method, program classification method and device - Google Patents

Program classification model training method, program classification method and device Download PDF

Info

Publication number
CN111222137A
CN111222137A CN201811419260.7A CN201811419260A CN111222137A CN 111222137 A CN111222137 A CN 111222137A CN 201811419260 A CN201811419260 A CN 201811419260A CN 111222137 A CN111222137 A CN 111222137A
Authority
CN
China
Prior art keywords
feature
fusion
value
dynamic
program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811419260.7A
Other languages
Chinese (zh)
Inventor
焦丽娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201811419260.7A priority Critical patent/CN111222137A/en
Priority to PCT/CN2019/119587 priority patent/WO2020108357A1/en
Publication of CN111222137A publication Critical patent/CN111222137A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Virology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例公开一种程序分类模型训练方法、程序分类方法及装置,提高了识别未知程序类别的准确率。其中程序分类模型训练方法包括:接收输入的多个样本程序;对多个样本程序中的每个选择出的样本程序,获取对应的静态特征的特征值和动态特征的特征值;根据对应的静态特征的特征值、动态特征的特征值以及融合操作规则,获得选择出的样本程序的至少一个备选融合特征的特征值;针对每个备选融合特征,根据该备选融合特征在样本程序中的特征值以及样本程序的类别,确定该备选融合特征的评价值;根据每个备选融合特征的评价值,从至少一个备选融合特征中选择目标融合特征;根据每个样本程序中目标融合特征的特征值,训练得到程序分类模型。

Figure 201811419260

The embodiments of the present application disclose a program classification model training method, program classification method and device, which improve the accuracy of identifying unknown program categories. The training method for the program classification model includes: receiving multiple input sample programs; obtaining corresponding eigenvalues of static features and eigenvalues of dynamic features for each selected sample program in the multiple sample programs; The eigenvalue of the feature, the eigenvalue of the dynamic feature, and the fusion operation rule, to obtain the eigenvalue of at least one candidate fusion feature of the selected sample program; for each candidate fusion feature, according to the candidate fusion feature in the sample program According to the evaluation value of each candidate fusion feature, select the target fusion feature from at least one candidate fusion feature; according to the target fusion feature in each sample program The eigenvalues of the features are fused, and the program classification model is obtained by training.

Figure 201811419260

Description

Program classification model training method, program classification method and device
Technical Field
The present application relates to the field of computers, and in particular, to a program classification model training method, a program classification method, and an apparatus.
Background
The classification of programs is an important requirement in the field of computers, which aims at identifying programs. Such as identifying whether a program is a normal program or a malicious program. The malicious program refers to a program with an attack intention, which may destroy the normal functions of the computer system, resulting in that the computer system cannot run normally or even crashes, so the malicious program is always a significant threat to the information security industry. If a program can be identified as a malicious program in advance, the program can be correspondingly processed, and the influence on a computer system is reduced.
Currently, a common program classification method firstly needs to train a program classification model by using a program of a known class, and then classifies the program of an unknown class, such as a malicious program or a normal program, based on the trained program classification model. Both in the training process and in the classification process, the corresponding features of the program need to be extracted. The common feature extraction method mainly comprises two methods, wherein one method is to extract static features of a program, and the static features refer to features obtained based on the structural characteristics of the program; and the other method is to extract dynamic characteristics of the program, wherein the dynamic characteristics refer to behavior characteristics embodied in the running process of the program.
However, although training a program classification model using static features may identify the classes of a portion of a program, once a programmer makes some form of change to the program, such as adding shells, morphing, or polymorphic forms to the program, the classes of the program cannot be identified. The dynamic features are extracted when the program is run in a virtual environment such as a sandbox, if an anti-virtual environment function is set in the unknown program, for example, if the program is detected to run in the sandbox, some commands are not executed, the sandbox cannot accurately extract the dynamic features of the program, and the program classification model cannot accurately identify the type of the program.
Therefore, how to improve the identification accuracy rate when identifying unknown program categories is a technical problem to be solved at present.
Disclosure of Invention
The embodiment of the application provides a program classification model training method, a program classification method and a program classification device, and improves the accuracy of recognizing unknown program categories.
In a first aspect, an embodiment of the present application provides a method for training a program classification model, where the method includes: firstly, a plurality of sample programs are received, wherein the sample programs refer to programs of which the categories belong to which the programs are calibrated in advance, and the sample programs belong to at least two different categories. The at least two different categories may include a normal program category and a malicious program category; or, the at least two different categories include at least two different categories of malware. Secondly, selecting a sample program from the plurality of sample programs, and executing the following processing to obtain the characteristic value of at least one candidate fusion feature of the selected sample program until each sample program in the plurality of sample programs is processed: acquiring a characteristic value of each static characteristic and a characteristic value of each dynamic characteristic of the selected sample program according to a preset static characteristic set comprising at least one static characteristic and a preset dynamic characteristic set comprising at least one dynamic characteristic; obtaining a feature value of at least one candidate fusion feature of the selected sample program according to the feature value of at least one static feature, the feature value of at least one dynamic feature and at least one fusion operation rule of the selected sample program, wherein the feature value of each candidate fusion feature of the at least one candidate fusion feature is obtained based on the corresponding fusion operation rule, and the fusion operation rule indicates that fusion operation is performed on the feature value of the specified static feature in the preset static feature set and the feature value of the specified dynamic feature in the preset dynamic feature set. Thirdly, for the first candidate fusion feature in the at least one candidate fusion feature, the following processing is performed, and so on, so as to obtain an evaluation value of each candidate fusion feature: and determining an evaluation value of the first candidate fusion feature according to the feature value of the first candidate fusion feature in each sample program and the category of each sample program, wherein the size of the evaluation value represents the effectiveness degree of the first candidate fusion feature for distinguishing the category to which the sample program belongs. Then, according to the evaluation value of each candidate fusion feature, selecting a target fusion feature from the at least one candidate fusion feature, wherein the evaluation value of the target fusion feature represents a greater validity degree than evaluation values of other candidate fusion features in the at least one candidate fusion feature. And finally, training to obtain a program classification model according to the characteristic value of the target fusion characteristic in each sample program.
The static characteristics reflect the structural characteristics of the selected sample program, such as dynamic link library file, selection header information, resource family information, part header family information, data directory table information, mapping file header, additional information, abnormal structure field, entry point information, executable code segment, and the like. If specific values for the static feature in the sample program are selected, these values may be used as feature values for the static feature. Such as selection header information, resource family information, part header family information, data directory table information, map file headers, etc. If the sample program itself has no specific value, the corresponding characteristic value is determined according to the actual performance of the sample program. For example, if the link library file is loaded, the characteristic value is 1; otherwise it is 0. The dynamic characteristics reflect behaviors of the selected sample program in the running process, for example, a parameter model of the sample program and/or at least one interface called by the sample program in the running process, and the parameter model is extracted according to parameters used by the sample program in the running process. Assuming that the at least one dynamic feature includes a third dynamic feature, the feature value of the third dynamic feature of the selected sample program may be a frequency of the third dynamic feature, and the frequency of the third dynamic feature is a ratio between the number of times the third dynamic feature appears in the selected sample program and a total number of dynamic features included in the preset dynamic feature set.
The method and the device have the advantages that the program classification model is trained based on the characteristic value of the fusion characteristic, or the program is classified by the program classification model subsequently, namely the dynamic characteristic is not limited to the program form, and the defect of insufficient characteristic extraction can be overcome through the static characteristic under the condition that the program is provided with the anti-virtual program function, so that compared with the prior art, the program classification model is trained according to the characteristic value of the fusion characteristic, and the accuracy of program category identification is improved.
Optionally, determining the evaluation value of the first candidate fusion feature according to the feature value of the first candidate fusion feature in each sample program and the category of each sample program includes: and according to the category to which the sample program belongs, counting the feature value of the first candidate fusion feature in the sample program of each category, so as to obtain the statistical value of the first candidate fusion feature in each category, such as the median, mean, variance and the like of the feature value of the first candidate fusion feature. Then, the evaluation value of the first candidate fusion feature is determined according to the statistics of the first candidate fusion feature in each category. The evaluation value may be a ratio, difference, variance, or the like of the first candidate fusion feature between the respective category statistics.
In practical application, the feature value of the at least one candidate fusion feature of the selected sample program is obtained according to the feature value of the at least one static feature, the feature value of the at least one dynamic feature, and the at least one fusion operation rule of the selected sample program, and various implementation manners are available.
As one possible implementation manner, the at least one candidate fusion feature includes a first candidate fusion feature, and a feature value of the first candidate fusion feature is obtained based on a corresponding first fusion operation rule. The first fusion operation rule indicates that mathematical operations, such as multiplication, addition, subtraction, and the like, are performed on the feature value of the first static feature in the preset static feature set and the feature value of the first dynamic feature in the preset dynamic feature set. Wherein the first static feature comprises one or more static features and the first dynamic feature comprises one or more dynamic features.
As another possible implementation manner, the at least one candidate fusion feature includes a second candidate fusion feature, and a feature value of the second candidate fusion feature is obtained based on a corresponding second fusion operation rule. The second fusion operation rule indicates that a logical operation, such as an and operation, or operation, nand operation, or the like, is performed on the feature value of the second static feature in the preset static feature set and the feature value of the second dynamic feature in the preset dynamic feature set. Wherein the second static feature comprises one or more static features and the second dynamic feature comprises one or more dynamic features.
As another possible implementation manner, the at least one candidate fusion feature includes a third candidate fusion feature, and a feature value of the third candidate fusion feature is obtained based on a corresponding third fusion operation rule. And the third fusion operation instruction determines the features which are the same in feature value and are the same in feature value from the preset static feature set and the preset dynamic feature set, and calculates the feature value of the third candidate fusion feature according to the total number of the features which are the same in feature value and are the same in feature value.
Optionally, calculating the feature value of the third candidate fusion feature according to the total number of features that are the same in feature itself and the same in feature value includes: firstly, determining the maximum value of a first numerical value and a second numerical value, wherein the first numerical value is the total number of static features contained in a preset static feature set, and the second numerical value is the total number of dynamic features contained in a preset dynamic feature set; and calculating the ratio between the total number of the features which are the same in the features and the same in the feature values and the maximum value, and taking the ratio as the feature value of the third candidate fusion feature. Of course, it is to be understood that calculating the ratio between the total number of features with the same feature itself and the same feature value and the maximum value is not a limitation for calculating the feature value of the third candidate fusion feature, and those skilled in the art can design the ratio according to specific situations.
In addition, the three implementation manners do not limit the technical scheme of the present application, and a person skilled in the art can design the implementation manners according to actual situations.
Optionally, in order to make the program classification model classify the class-unknown program more accurately, the training to obtain the program classification model according to the feature value of the target fusion feature in each sample program includes: and training to obtain a program classification model according to the characteristic value of the target fusion characteristic in each sample program, the characteristic value of at least one static characteristic of each sample program and the characteristic value of at least one dynamic characteristic of each sample program.
In a second aspect, an embodiment of the present application provides a program classification method, where the method includes: first, a target program is acquired. Secondly, acquiring a characteristic value of each static characteristic and a characteristic value of each dynamic characteristic of the target program according to a preset static characteristic set comprising at least one static characteristic and a preset dynamic characteristic set comprising at least one dynamic characteristic. And thirdly, acquiring a characteristic value of at least one target fusion characteristic of the target program, wherein the characteristic value of the at least one target fusion characteristic of the target program is obtained based on a corresponding fusion operation rule, and the fusion operation rule indicates that fusion operation is performed on the characteristic value of the specified static characteristic in the preset static characteristic set and the characteristic value of the specified dynamic characteristic in the preset dynamic characteristic set. And finally, inputting the characteristic value of at least one target fusion characteristic of the target program into the program classification model to obtain a classification result of the target program. For example, the classification result of the target program as a normal program or a malicious program; or, the target program is a classification result of one of a plurality of malicious program categories.
The static features are features that embody structural features of the target program, such as dynamic link library files, selection header information, resource family information, part header family information, data directory family information, mapping file headers, additional information, exception structure fields, entry point information, executable code segments, and the like. The feature values of the static features may be extracted from the target program or derived from the performance. The dynamic characteristics are behavior characteristics, such as a parameter model and a preset interface, of the target program in the running process. If at least one dynamic feature of the target program comprises a parameter model and a preset interface, acquiring a feature value of the dynamic feature of the target program comprises: acquiring a preset interface called by a target program in the running process and used parameters; extracting a parameter model of the parameter according to the used parameter; and selecting a third dynamic feature from at least one dynamic feature of the target program, taking the frequency of the third dynamic feature as a feature value of the third dynamic feature, and so on to obtain feature values of all dynamic features of the target program, wherein the frequency of the third dynamic feature is a ratio of the occurrence frequency of the third dynamic feature in the selected sample program to the total number of the dynamic features included in all preset dynamic feature sets.
According to the method and the device, the characteristic value of the target fusion characteristic obtained by performing fusion operation on the characteristic value of the specified static characteristic and the characteristic value of the specified dynamic characteristic of the target program is input into the program classification model, and not only the characteristic value of the static characteristic or only the characteristic value of the dynamic characteristic, so that the double advantages that the dynamic characteristic is utilized to identify the program with the changed form, and meanwhile, when the program is provided with the anti-virtual environment function, the static characteristic reflecting the structural characteristics of the program is utilized to identify the program are combined, and the accuracy of the program classification model for classifying the target program is effectively improved.
In practical applications, there may be a plurality of implementation manners for obtaining the feature value of the at least one target fusion feature of the target program based on the corresponding fusion operation rule.
As one possible implementation manner, the at least one target fusion feature includes a first target fusion feature, and a feature value of the first target fusion feature is obtained based on a corresponding first fusion operation rule. The first fusion operation rule indicates that mathematical operations, such as multiplication, addition, subtraction, and the like, are performed on the feature value of the first static feature in the preset static feature set and the feature value of the first dynamic feature in the preset dynamic feature set. Wherein the first static feature comprises one or more static features and the first dynamic feature comprises one or more dynamic features.
As another possible implementation manner, the at least one target fusion feature includes a second target fusion feature, and a feature value of the second target fusion feature is obtained based on a corresponding second fusion operation rule. The second fusion operation rule indicates that a logical operation, such as an and operation, or operation, nand operation, or the like, is performed on the feature value of the second static feature in the preset static feature set and the feature value of the second dynamic feature in the preset dynamic feature set. Wherein the second static feature comprises one or more static features and the second dynamic feature comprises one or more dynamic features.
As another possible implementation manner, the at least one target fusion feature includes a third target fusion feature, and a feature value of the third target fusion feature is obtained based on a corresponding third fusion operation rule. And the third fusion operation instruction determines the features with the same feature value and the same feature value from the preset static feature set and the preset dynamic feature set, and calculates the feature value of the third target fusion feature according to the total number of the features with the same feature value and the same feature value.
Optionally, calculating the feature value of the third target fusion feature according to the total number of features that are the same in feature itself and the same in feature value includes: determining the maximum value of a first numerical value and a second numerical value, wherein the first numerical value is the total number of static features contained in the preset static feature set, and the second numerical value is the total number of dynamic features contained in the preset dynamic feature set; and calculating the ratio between the total number of the features with the same features and the same feature values and the maximum value, and taking the ratio as the feature value of the third target fusion feature. Of course, it is to be understood that calculating the ratio between the total number of features with the same feature itself and the same feature value and the maximum value is not a limitation for calculating the feature value of the third candidate fusion feature, and those skilled in the art can design the ratio according to specific situations.
In addition, the three implementation manners do not limit the technical scheme of the present application, and a person skilled in the art can design the implementation manners according to actual situations.
Optionally, the target program is multiple. If the category of the target program cannot be identified through the program classification model, the method further comprises the following steps: and clustering the plurality of target programs according to the characteristic value of at least one target fusion characteristic of each target program in the plurality of target programs to obtain the category of each target program.
In a third aspect, an embodiment of the present application further provides a device for training a program classification model, where the device includes:
a receiving unit, configured to receive a plurality of input sample programs, where a sample program refers to a program to which a category belongs that has been calibrated in advance, and the plurality of sample programs belong to at least two different categories;
a first processing unit, configured to select a sample program from the plurality of sample programs, and perform the following processing to obtain a feature value of at least one candidate fusion feature of the selected sample program until each sample program of the plurality of sample programs is processed:
based on a preset set of static features including at least one static feature, and a preset including at least one dynamic feature
Dynamic feature set, obtaining feature value of each static feature and feature of each dynamic feature of selected sample program
The value, static characteristics reflect structural characteristics of the selected sample program, and the dynamic characteristics reflect structural characteristics of the selected sample program
Behavior reflected in the course of operation;
according to the characteristic value of at least one static characteristic and the characteristic of at least one dynamic characteristic of the selected sample program
Values and at least one fusion operation rule for obtaining characteristics of at least one candidate fusion feature of the selected sample program
Eigenvalues, the eigenvalue of each of the at least one candidate fused feature being based on the corresponding fusion operation
The rule is obtained by fusing the characteristic value of the specified static characteristic in the operation rule indication set and the preset static characteristic
Executing fusion operation on the characteristic values of the designated dynamic characteristics in the dynamic characteristic set;
a second processing unit, configured to, for a first candidate fusion feature of the at least one candidate fusion feature, perform the following processing, and so on, to obtain an evaluation value of each candidate fusion feature: determining an evaluation value of the first candidate fusion feature according to the feature value of the first candidate fusion feature in each sample program and the category of each sample program, wherein the size of the evaluation value represents the validity degree of the first candidate fusion feature for distinguishing the category to which the sample program belongs;
the selection unit is used for selecting the target fusion feature from the at least one candidate fusion feature according to the evaluation value of each candidate fusion feature, wherein the validity degree of the evaluation value of the target fusion feature is greater than the validity degree of the evaluation values of other candidate fusion features in the at least one candidate fusion feature;
and the training unit is used for training to obtain a program classification model according to the characteristic value of the target fusion characteristic in each sample program.
Optionally, determining the evaluation value of the first candidate fusion feature according to the feature value of the first candidate fusion feature in each sample program and the category of each sample program includes:
according to the category to which the sample program belongs, the characteristic value of the first alternative fusion feature in the sample program of each category is counted, so that the statistical value of the first alternative fusion feature in each category is obtained; and determining the evaluation value of the first candidate fusion feature according to the statistic value of the first candidate fusion feature in each category.
Optionally, the feature value of the first candidate fusion feature is obtained based on the corresponding first fusion operation rule;
each fusion operation rule indicates that performing fusion operation on the feature values of the specified static features in the preset static feature set and the feature values of the specified dynamic features in the preset dynamic feature set comprises:
the first fusion operation rule instructs to perform a mathematical operation on a feature value of a first static feature in the preset static feature set and a feature value of a first dynamic feature in the preset dynamic feature set.
Optionally, the at least one candidate fusion feature includes a second candidate fusion feature, and a feature value of the second candidate fusion feature is obtained based on a corresponding second fusion operation rule;
each fusion operation rule indicates that performing fusion operation on the feature values of the specified static features in the preset static feature set and the feature values of the specified dynamic features in the preset dynamic feature set comprises:
the second fusion operation rule instructs to perform a logical operation on the feature value of the second static feature in the preset static feature set and the feature value of the second dynamic feature in the preset dynamic feature set.
Optionally, the at least one candidate fusion feature includes a third candidate fusion feature, and a feature value of the third candidate fusion feature is obtained based on a corresponding third fusion operation rule;
each fusion operation rule indicates that performing fusion operation on the feature values of the specified static features in the preset static feature set and the feature values of the specified dynamic features in the preset dynamic feature set comprises:
and the third fusion operation instruction determines the features which are the same in feature value and are the same in feature value from the preset static feature set and the preset dynamic feature set, and calculates the feature value of the third candidate fusion feature according to the total number of the features which are the same in feature value and are the same in feature value.
In a fourth aspect, an embodiment of the present application further provides a program classifying device, where the device includes:
a program acquisition unit configured to acquire a target program;
the first characteristic value acquisition unit is used for acquiring a characteristic value of each static characteristic and a characteristic value of each dynamic characteristic of the target program according to a preset static characteristic set comprising at least one static characteristic and a preset dynamic characteristic set comprising at least one dynamic characteristic; the static characteristics are characteristics which embody the structural characteristics of the target program, and the dynamic characteristics are behavior characteristics which embody the target program in the running process;
a second feature value obtaining unit, configured to obtain a feature value of at least one target fusion feature of the target program, where the feature value of the at least one target fusion feature of the target program is obtained based on a corresponding fusion operation rule, and the fusion operation rule indicates that a fusion operation is performed on a feature value of a specified static feature in the preset static feature set and a feature value of a specified dynamic feature in the preset dynamic feature set;
and the classification unit is used for inputting the characteristic value of at least one target fusion characteristic of the target program into the program classification model to obtain a classification result of the target program.
Optionally, the at least one target fusion feature includes a first target fusion feature, and a feature value of the first target fusion feature is obtained based on a corresponding first fusion operation rule;
the fusion operation rule indicates that executing fusion operation on the feature value of the specified static feature in the preset static feature set and the feature value of the specified dynamic feature in the preset dynamic feature set comprises:
the first fusion operation rule instructs to perform a mathematical operation on a feature value of a first static feature in the preset static feature set and a feature value of a first dynamic feature in the preset dynamic feature set.
Optionally, the at least one target fusion feature includes a second target fusion feature, and a feature value of the second target fusion feature is obtained based on a corresponding second fusion operation rule;
the fusion operation rule indicates that executing fusion operation on the feature value of the specified static feature in the preset static feature set and the feature value of the specified dynamic feature in the preset dynamic feature set comprises:
the second fusion operation rule instructs to perform a logical operation on the feature value of the second static feature in the preset static feature set and the feature value of the second dynamic feature in the preset dynamic feature set.
Optionally, the at least one target fusion feature includes a third target fusion feature, and a feature value of the third target fusion feature is obtained based on a corresponding third fusion operation rule;
the fusion operation rule indicates that executing fusion operation on the feature value of the specified static feature in the preset static feature set and the feature value of the specified dynamic feature in the preset dynamic feature set comprises:
and the third fusion operation instruction determines the features with the same feature value and the same feature value from the preset static feature set and the preset dynamic feature set, and calculates the feature value of the third target fusion feature according to the total number of the features with the same feature value and the same feature value.
Drawings
Fig. 1 is a schematic diagram of an enterprise network architecture provided in an embodiment of the present application;
fig. 2 is a schematic diagram of a cloud network architecture according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a method for training a program classification model according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of a program classification method according to an embodiment of the present application;
fig. 5 is a block diagram illustrating a structure of a program classification model training apparatus according to an embodiment of the present disclosure;
fig. 6 is a block diagram of a program classifying device according to an embodiment of the present application;
FIG. 7 is a diagram of a hardware architecture of a program classification model training apparatus according to an embodiment of the present application;
fig. 8 is a hardware architecture diagram of a program sorting apparatus according to an embodiment of the present application.
Detailed Description
In order to improve the accuracy of identifying unknown program categories, the embodiment of the application provides a program classification model training method and device and a program classification method and device.
The program classification training method and device and the program classification method and device provided by the embodiment of the application can be applied to application scenarios shown in fig. 1 and fig. 2, for example.
Fig. 1 is a schematic diagram of an enterprise network architecture. In fig. 1, the enterprise network architecture includes a security device 101, a network access device 102, such as a firewall or security gateway, a switch 103 connected to the network access device 102, and a plurality of hosts 104 connected to the switch. Wherein the security device 101 is connected to the network access device 102. The security device 101 may be, for example, an Intrusion Prevention System (IPS) device or a Unified Threat Management (UTM) device, etc. The security device 101 is configured to train a program classification model, and receive a test sample sent by a firewall or a security gateway in the device 102, or receive a test sample sent by client software installed on the intranet host 104, and output a category to which the test sample belongs.
Fig. 2 is a schematic diagram of a cloud network architecture. In fig. 2, the cloud network architecture may include a security device 201 located on the core network side, and a plurality of firewall devices 202 in the access network. The security device 201 may include modules such as a cloud sandbox, which are used to train a program classification model, receive a test sample from the device 202 in which the firewall is deployed, and output a category to which the test sample belongs.
The following describes the program classification model training method provided by the embodiments of the present application in detail with reference to the accompanying drawings. The execution subject of the training method may be the security device 101 in fig. 1 or the security device 201 in fig. 2. The workflow of the security device 101 and the security device 201 mainly includes a training phase and a testing phase. In the training phase, the inputs of the security device 101 and the security device 201 are training sets, and the outputs are generated program classification models. The training set comprises a plurality of training samples, and the training samples refer to sample programs of which the categories belong to which the training samples are calibrated in advance. The security device 101 and the security device 201 generate a program classification model according to a training set and a predetermined machine learning algorithm. In the testing stage, the input of the safety device 101 and the safety device 201 is a test sample, and the output is a category to which the test sample belongs, wherein the test sample refers to a sample program of which the category is unknown. In the testing phase, the safety device 101 and the safety device 201 classify the test samples according to the generated program classification model.
Referring to fig. 3, the figure is a schematic flowchart of a program classification model training method provided in the embodiment of the present application.
The program classification model training method provided by the embodiment of the application can comprise the following steps:
s101: an input of a plurality of training samples is received.
In the embodiment of the present application, the sample program refers to a program to which a category belongs that has been calibrated in advance, and the plurality of sample programs belong to at least two different categories. For example, the at least two different categories may include two categories, normal programs and malicious programs. As another example, the at least two different categories include at least two different categories of malicious programs, such as worms (work), trojan (trojan) trojans, downloaders (downloaders), backdoors, and the like. The malicious program may be, for example, a Portable Executable (PE).
S102: one sample program is selected from the plurality of sample programs, and S1021 and S1022 are executed to obtain a feature value of at least one candidate fusion feature of the selected sample program until each of the plurality of sample programs is processed.
That is, each of the plurality of sample programs has at least one candidate fusion feature, and the calculation method of the feature value of the candidate fusion feature may be referred to as S1021 and S1022.
S1021: and acquiring the characteristic value of each static characteristic and the characteristic value of each dynamic characteristic of the selected sample program according to a preset static characteristic set comprising at least one static characteristic and a preset dynamic characteristic set comprising at least one dynamic characteristic.
In the embodiment of the application, a preset static feature set and a preset dynamic feature set are predetermined, wherein the preset static feature set comprises at least one static feature, and the preset dynamic feature set comprises at least one dynamic feature. And then analyzing the selected sample program to obtain characteristic values respectively corresponding to the selected sample program and each static characteristic in the preset static characteristic set and characteristic values respectively corresponding to each dynamic characteristic in the preset dynamic characteristic set. That is, the preset static feature set corresponding to each sample program is the same, and only the feature values of the static features in the preset static feature set are different. Similarly, the preset dynamic feature set corresponding to each sample program is the same, and only the feature values of the dynamic features in the preset dynamic feature set are different.
Wherein the static features reflect structural features of the sample program. For example, the static features include one or more of the following: a Dynamic Link Library (DLL), image optional header (image optional header) information, resource family (resource family) information, partial header family (section header family) information, data directory table (data directory family) information, image file header (image file header), additional (overlay) information, an exception structure field, entry point (entry point) information, and an executable code segment.
Wherein each of the static features described above may include one or more static features. For example, the dynamic link library file may include ADVAPI32.DLL, AWFAXP32.DLL, AWFXAB32.DLL, etc. The exception structure field refers to a field in which an exception may occur, and includes, for example, a discarded (predicted) field, a default (default) field, a reserved (reserved) field, a structure (structure) field, and the like.
If specific values for the static feature in the sample program are selected, these values may be used as feature values for the static feature. Such as selection header information, resource family information, part header family information, data directory table information, map file headers, etc. If the sample program itself has no specific value, the corresponding characteristic value is determined according to the actual performance of the sample program. For example, if the link library file is loaded, the characteristic value is 1; otherwise it is 0. For example, whether the entry point information is in the executable code segment, and if so, the characteristic value is 0; otherwise it is 1. For another example, whether the additional information exists, if so, the characteristic value is 1; otherwise it is 0.
Optionally, in order to improve the effectiveness of the classification of the static feature, some specific N-gram features may also be used as the static features. The N-gram is an algorithm based on a statistical language model, and the basic idea is to perform window sliding window operation with the size of N on the content in a text according to bytes to form a byte fragment sequence with the length of N, wherein each byte fragment is called as a gram. In the embodiments of the present application, N is greater than or equal to 2.
For example, an N-gram feature is extracted from an executable code segment, and a sliding window operation with a size of 4 is performed according to bytes of the executable code segment, so as to form byte fragment sequences with a length of 4, and each byte fragment sequence can be regarded as a static feature.
If the N-gram feature can be extracted from the selected sample program, the feature value of the N-gram feature can be 1; otherwise it is 0.
The static features and the feature values of the static features of the selected sample program are described above, and the dynamic features and the feature values of the dynamic features of the selected sample program are described below.
The dynamic characteristics reflect behaviors of the sample program during the running process, and the behaviors comprise process operations, file operations, network operations, registry operations and the like. In order to obtain the characteristic value of the dynamic characteristic of the selected sample program, the selected sample program can be placed in a sandbox to operate, and when the characteristic value of the dynamic characteristic is obtained, the influence on the system when the sample program is a malicious program is also avoided.
In the embodiment of the present application, the dynamic characteristics may include an Application Programming Interface (API) called by the sample program during the running process and/or parameters used by the sample program. The API is used for indicating the type of the operation behavior of the sample program, such as file creation, registry modification and the like, and the parameter represents the object of the operation behavior, such as a file path, a registry path and the like. Each run behavior corresponds to an API and at least one parameter.
In order to improve the generalization capability of the training classification model and prevent overfitting, the embodiment of the application abstracts the parameter model from the parameters.
For example, the file parameter path "c: \ \ users \ \ zhangsan \ \ appdata \ \ local \ \ temp \ \ user's temporary directory may be abstracted as the parameter model" c: \ \ users \ \ \ \ appdata \ \ roaming | local \ \ te | t ] mp \ ", where" \ "represents that there may be any content there; [ roaming | local ] represents that any one of roaming and local occurring in the place is counted as hit, and the roaming and the local have basically the same meaning in an operating system and are used for storing the release content of the application program; [ te | t ] mp ] represents where either tmp or temp appears to be hit, and different operating systems may have different names for temporary files.
For another example, the registry path "hklm \ software \ microsoft \ windows \ currentversion \ runonce" may be abstracted as the parameter model "hk [ cu | lm ] \ \ software \ microsoft \ windows \ currentversion \ runonce", where hk [ cu \ lm ] represents that any one of hkcu and hklm appearing therein is a hit, and may match different registry root key values.
Since neither the API nor the parametric model is a number, the API and the parametric model may be assigned corresponding codes for ease of description. For example, if the selected sample program has 150 APIs, the 150 APIs may be represented by code numbers 1-150, one for each API. If 400 parametric models are selected for the sample, the 400 parametric models may be represented by a code number 1-400, one for each parametric model.
In the embodiment of the present application, the dynamic feature may be an API, or a parametric model, or a combination of an API and a parametric model.
If the dynamic characteristic is an API, the frequency of occurrence of the API of the selected sample program may be used as a characteristic value of the API of the selected sample program. And the frequency of the API is equal to the ratio of the number of times of the API appearing in the selected sample program to the total number of all the APIs in the preset dynamic feature set. For example, if the API with the code number 1 in the 150 APIs included in the preset dynamic feature set appears 30 times during the running of the selected sample program, the feature value corresponding to the API with the code number 1 is 0.2 (30/150).
If the dynamic feature is a parametric model, the frequency of occurrence of the parametric model of the selected sample program may be used as a feature value of the parametric model of the selected sample program. And the frequency of the parameter model of the selected sample program is equal to the ratio of the number of times of the parameter model in the selected sample program to the total number of all the parameter models in the preset dynamic characteristic set. For example, if a parametric model with the reference number 3 of 400 parametric models included in the preset dynamic feature set occurs 40 times in the selected sample program running process, the characteristic value corresponding to the parametric model with the reference number 3 is 0.1 (40/400).
If the dynamic features are a combination of APIs and parametric models, then each API may be combined with each parametric model, i.e., one combination includes one API and one parametric model. This combination can be expressed, for example, as: code of code _ parameter model of API. For example, a dynamic feature identified as "2 _ 5" represents a combination of an API with code 2 and a parametric model with code 5 for the dynamic feature. If the dynamic feature is a combination of the API and the parameter model, the feature value of the dynamic feature is a frequency of occurrence of the corresponding combination of the API and the parameter model in all combinations of the preset dynamic feature set (for convenience of description, hereinafter, referred to as a frequency of the dynamic feature). The frequency of occurrence of a combination of the API and the parametric model in all combinations of the preset dynamic feature set is equal to the ratio of the number of occurrences of the combination of the API and the parametric model in all combinations to the total number of all combinations. For example, assuming that there are 1000 combinations of all APIs and parameter models in the preset dynamic feature set, and the number of occurrences of the combination identified as "2 _ 5" is 10, the frequency of the combination identified as "2 _ 5" is 0.01(10/1000), i.e., the feature value of the dynamic feature "2 _ 5" is 0.01.
Of course, the feature value of the dynamic feature may be other than the frequency of the dynamic feature, for example, if a certain dynamic feature can be obtained according to the selected sample program, the feature value of the dynamic feature is 1; if not, the eigenvalue of the dynamic characteristic is 0.
After the feature values of the static features and the feature values of the dynamic features of the selected sample program are acquired, S1022 may be executed.
S1022: and obtaining the characteristic value of at least one alternative fusion characteristic of the selected sample program according to the characteristic value of at least one static characteristic, the characteristic value of at least one dynamic characteristic and at least one fusion operation rule of the selected sample program.
The features of the traditional technique input program classification model are either static features or dynamic features. As mentioned above, although the static features can reflect the structural features of the program, once the designer of the program changes the form of the program, the program classification model cannot identify the category of the program. The dynamic characteristics can reflect the behavior characteristics of the program in the running process, and even if the form of the program is changed, the behavior characteristics are the same, so that the problem of static characteristics can be overcome. However, the extraction of the dynamic features needs to be performed by running the program in a virtual environment such as a sandbox, and if an anti-virtual program function is set in the program, for example, it is detected that the program runs in the sandbox, some commands are not executed, so that the sandbox cannot extract all the dynamic features. Therefore, the classification of the program cannot be accurately determined only by the dynamic characteristics.
In order to overcome the technical problem, in the embodiment of the present application, the feature value of the static feature and the feature value of the dynamic feature are fused to obtain the feature value of the fused feature, and the program classification model is trained based on the feature value of the fused feature, or the program is subsequently classified by using the program classification model, that is, the method has the advantages that the dynamic feature is not limited to the program form, and the defect of insufficient feature extraction is made up by the static feature under the condition that the program is provided with the anti-virtual program function, so that compared with the prior art, the method trains the program classification model according to the feature value of the fused feature, and improves the accuracy of program category identification.
In specific implementation, after the feature value of the static feature and the feature value of the dynamic feature of the selected sample program are obtained, the feature value of at least one alternative fusion feature of the selected sample program is obtained according to the feature value of at least one static feature, the feature value of at least one dynamic feature and at least one fusion operation rule of the selected sample program, and then an alternative fusion feature with high effectiveness degree for distinguishing the category to which the sample program belongs is selected from the at least one alternative fusion feature to be used as a target fusion feature which is finally input to the program classification model.
In practical application, the feature value of the at least one candidate fusion feature of the selected sample program is obtained according to the feature value of the at least one static feature, the feature value of the at least one dynamic feature, and the at least one fusion operation rule of the selected sample program, and various implementation manners are available.
As one possible implementation manner, the at least one candidate fusion feature includes a first candidate fusion feature, and a feature value of the first candidate fusion feature is obtained based on a corresponding first fusion operation rule. The first fusion operation rule indicates that mathematical operations, such as multiplication, addition, subtraction, and the like, are performed on the feature value of the first static feature in the preset static feature set and the feature value of the first dynamic feature in the preset dynamic feature set. Wherein the first static feature comprises one or more static features and the first dynamic feature comprises one or more dynamic features.
For example, assume that the first static feature of the sample program is an executable code segment and the feature value of the executable code segment is a binary entropy value of the executable code segment. The first dynamic characteristic of the sample program is reading the application directory file, and the characteristic value of the reading application directory file is the frequency of the corresponding API and parameter model combination, which may also be referred to as the frequency of reading the application directory file for simplicity. Then, the first candidate fusion feature resulting from the fusion operation of the executable code segment and the read application directory file may have a feature value that is a product of the binary entropy value of the executable code segment and the frequency of reading the application directory file.
As another possible implementation manner, the at least one candidate fusion feature includes a second candidate fusion feature, and a feature value of the second candidate fusion feature is obtained based on a corresponding second fusion operation rule. The second fusion operation rule indicates that a logical operation, such as an and operation, or operation, nand operation, or the like, is performed on the feature value of the second static feature in the preset static feature set and the feature value of the second dynamic feature in the preset dynamic feature set. Wherein the second static feature comprises one or more static features and the second dynamic feature comprises one or more dynamic features.
For example, the second static feature of the sample program is additional information, and a feature value of the additional information may be determined according to whether the additional information exists in the sample program. If the additional information exists, the characteristic value of the second static characteristic is 1; otherwise it is 0. The second dynamic characteristic of the sample program is networking operation, and if the sample program has networking operation, the characteristic value of the second dynamic characteristic is 1; otherwise it is 0. Then, the fusion operation may be a second and operation of the feature value of the static feature and the feature value of the second dynamic feature, that is, if the sample program has both additional information and networking operation, the feature value of the second candidate fusion feature is 1, otherwise it is 0.
As another possible implementation manner, the at least one candidate fusion feature includes a third candidate fusion feature, and a feature value of the third candidate fusion feature is obtained based on a corresponding third fusion operation rule. And the third fusion operation instruction determines the features which are the same in feature value and are the same in feature value from the preset static feature set and the preset dynamic feature set, and calculates the feature value of the third candidate fusion feature according to the total number of the features which are the same in feature value and are the same in feature value.
For example, assuming that the static feature set of the sample program includes 150 APIs, and the dynamic feature set of the sample program includes 50 APIs, where after the feature values of the static features and the dynamic features corresponding to the sample program are obtained, there are 40 APIs between the two which have the same feature and the same feature value, and then the feature values of the third candidate fusion features may be obtained according to the total number of the APIs which have the same feature and the same feature value, that is, 40 APIs.
Optionally, calculating the feature value of the third candidate fusion feature according to the total number of features that are the same in feature itself and the same in feature value may include: first, a maximum value of a first value and a second value is determined, where the first value is a total number of static features included in a preset static feature set, and the second value is a total number of dynamic features included in a preset dynamic feature set. And secondly, calculating the ratio between the total number of the features which are the same in feature and have the same feature value and the maximum value, and taking the ratio as the feature value of the third candidate fusion feature.
For example, in the above example, since the static feature set includes a greater total number of APIs than the dynamic feature set, the maximum of the first and second values is 150, and the feature value of the third candidate fused feature is equal to about 0.27 (40/150).
Of course, it is to be understood that calculating the ratio between the total number of features with the same feature itself and the same feature value and the maximum value is not a limitation for calculating the feature value of the third candidate fusion feature, and those skilled in the art can design the ratio according to specific situations.
In addition, the three implementation manners do not limit the technical scheme of the present application, and a person skilled in the art can design the implementation manners according to actual situations.
After the feature value of at least one candidate fusion feature is obtained, S103 may be executed in the embodiment of the present application.
S103: for a first candidate fusion feature in the at least one candidate fusion feature, performing the following processing, and so on, to obtain an evaluation value of each candidate fusion feature: and determining the evaluation value of the first candidate fusion feature according to the feature value of the first candidate fusion feature in each sample program and the category of each sample program.
Since not every candidate fusion feature can effectively distinguish the class of the sample program, in order to improve the training efficiency of the program classification model, in the embodiment of the present application, a target fusion feature is automatically selected from at least one candidate fusion feature, and the size of the evaluation value represents the effective degree of the first candidate fusion feature for distinguishing the class to which the sample program belongs according to the evaluation value of the at least one candidate fusion feature. Based on the evaluation value of at least one candidate fusion feature, a target fusion feature capable of effectively distinguishing the category to which the sample program belongs can be selected from the candidate fusion features.
Specifically, the feature value of the first candidate fusion feature in the sample program of each category may be counted according to the category to which the sample program belongs, so as to obtain a statistical value of the first candidate fusion feature in each category, for example, a median, a mean, a variance, and the like of the feature value of the first candidate fusion feature. Then, the evaluation value of the first candidate fusion feature is determined according to the statistics of the first candidate fusion feature in each category. The evaluation value may be a ratio, difference, variance, or the like of the first candidate fusion feature between the respective category statistics.
For example, assuming that at least two categories include normal programs and malicious programs, the static features of the sample program include executable code segments, and the dynamic features include reading application directory files, writing system files, deleting system files, registry reading application version information, and registry adding a self-launching entry. Then, the binary entropy of the executable code segment of each sample program is multiplied by the eigenvalues of the five dynamic characteristics, respectively, to obtain the eigenvalues of the five candidate fusion characteristics of the sample program. And respectively counting the characteristic values of the plurality of sample programs according to two categories of normal programs and malicious programs to obtain the statistical values, such as median, of the plurality of sample programs in the two categories. Then, the evaluation value of each candidate fusion feature can be obtained according to the statistics of the five candidate fusion feature values in the two categories. Wherein the evaluation value of each candidate fusion feature may be a ratio of a difference between the normal program statistic and the malicious program statistic to a larger one of the normal program statistic and the malicious program statistic. Referring to table 1, the table represents a calculation method of evaluation values of the five candidate fusion features in a scenario where a sample program includes two categories, namely a normal program and a malicious program.
TABLE 1
Figure BDA0001880196680000121
Figure BDA0001880196680000131
Note: "+" indicates multiplication.
Assuming that the at least two categories include three categories of malicious programs, such as trojan horses, downloaders, and worms, the static characteristics as well as the dynamic characteristics of the sample program are exemplified above, and the evaluation value of each of the five candidate fusion features may be the variance of the statistical values of the three categories. See table 2, which is a calculation method of evaluation values of the five candidate fusion features described above in a scenario where the sample program includes three categories of trojan horses, downloaders, and worms.
TABLE 2
Figure BDA0001880196680000132
For another example, assume that at least two categories include a normal program and a malicious program, the static feature of the sample program includes additional information, and the feature value of the static feature is the ratio of the size of the additional information to the size of the entire file; the dynamic characteristics comprise network operation behaviors, registry self-starting item operation behaviors, write execution file operation behaviors and read loading system DLL behaviors. Then, the ratio of the size of the additional information to the size of the whole file of each sample program is multiplied by the eigenvalues of the four dynamic characteristics, respectively, to obtain the eigenvalues of the four candidate fusion characteristics of the sample program. And respectively counting the characteristic values of the plurality of sample programs according to two categories of normal programs and malicious programs to obtain the statistical values, such as median, of the plurality of sample programs in the two categories. Then, the evaluation value of each candidate fusion feature can be obtained according to the statistics of the four candidate fusion feature values in the two categories. Wherein the evaluation value of each candidate fusion feature may be a difference between normal program statistics and malicious program statistics. Referring to table 3, the table represents a method for calculating evaluation values of the four candidate fusion features in a scenario where a sample program includes two categories, namely a normal program and a malicious program.
TABLE 3
Figure BDA0001880196680000141
Assuming that the at least two classes include three classes of malicious programs, such as trojan horses, downloaders, and worms, the static features as well as the dynamic features of the sample program are exemplified above, and the evaluation value of each of the four candidate fusion features may be the variance of the statistical values of the three classes. See table 4, which is a calculation method of evaluation values of the above four candidate fusion features in a scenario where the sample program includes three categories of trojan horses, downloaders, and worms.
TABLE 4
Figure BDA0001880196680000142
Figure BDA0001880196680000151
S104: and selecting the target fusion feature from at least one candidate fusion feature according to the evaluation value of each candidate fusion feature.
In the embodiment of the present application, the validity degree of the evaluation value of the selected target fusion feature is greater than the validity degree of the evaluation values of other candidate fusion features except for the target fusion feature in at least one candidate fusion feature. The number of target fusion features may be one or more.
In a binary classification scenario, that is, when the sample program includes two categories, i.e., a normal program and a malicious program, taking table 1 as an example, the evaluation value of the alternative fusion feature is a ratio between a difference value between a median of the normal program and a median of the malicious program and a larger value between the median of the normal program and the median of the malicious program. According to experience, the executable code segment of the normal program is more uniform than that of the malicious program, so that the binary entropy value of the executable code segment of the normal program is larger than that of the executable code segment of the malicious program, and the frequency of reading the application directory file by the normal program and the frequency of reading the application version information by the registry are higher relative to the malicious program, so that in the scenario of table 1, the higher the ratio of one candidate fusion feature is, the higher the effectiveness degree of distinguishing the category to which the sample program belongs by the candidate fusion feature is; conversely, the lower the ratio, the less effective the candidate fusion feature is in distinguishing the class to which the sample program belongs.
Therefore, in practical applications, a threshold value may be designed, and candidate fusion features having evaluation values greater than or equal to the threshold value may be used as target fusion features.
For example, the threshold value is 0.6. In table 1, the evaluation values corresponding to the first candidate fusion feature, the fourth candidate fusion feature and the fifth candidate fusion feature are respectively 0.888, 0.8864 and 0.6418, which are all greater than 0.6, so that these three candidate fusion features can be used as target fusion features.
Taking table 3 as an example, the evaluation value of each candidate fusion feature is the difference between the normal program statistic and the malicious program statistic, and a larger difference indicates a higher degree of effectiveness of the candidate fusion features in distinguishing the categories to which the sample programs belong. According to experience, a malicious program may add executable code at the additional information, thereby causing a high ratio between the size of the additional information and the overall file size, and at the same time, the malicious program has a high possibility of network operation. So under the scenario of table 3, the evaluation value of the first candidate fusion feature is high, up to 0.145. Assuming that the threshold is 0.05, the candidate fusion feature larger than the threshold in table 3 has only the first candidate fusion feature, so that the candidate fusion feature can be used as the target fusion feature.
In a multi-classification scenario, i.e., when the sample program includes multiple classes of malicious programs, taking table 2 as an example, the evaluation value of each candidate fusion feature may be a variance of statistics of the multiple classes. The larger the variance is, the more effective the candidate fused feature is in distinguishing the class to which the sample program belongs. Therefore, if the threshold value is 20, the feature value (51.014) of the second candidate fusion feature and the feature value (38.592) of the fifth candidate fusion feature in table 2 are both higher than the threshold value, and therefore the two candidate fusion features can be set as the target fusion feature. The reason why the feature value of the second candidate fusion feature and the feature value of the fifth candidate fusion feature are higher is that the frequency of worm writing system files and the frequency of adding auto-start items to the registry are higher relative to the normal procedure.
Since table 4 is similar to table 2, the process of selecting the target fusion feature is not described in detail here.
After the target fusion feature is obtained, S105 may be performed.
S105: and training to obtain a program classification model according to the characteristic value of the target fusion characteristic in each sample program.
In the embodiment of the present application, the training classification model is a model for classifying a program whose category is unknown. S101 to S105 describe a process of training a program classification model, and when the program classification model works, a feature value of a target fusion feature of a program whose category is unknown may be input, and a category to which the program belongs may be output. The specific steps will be described in detail below.
The program classification model may be trained by machine learning, for example, Random Forest (RF) algorithm, Artificial Neural Network (ANN) algorithm, and the like.
As mentioned above, because the advantages of the static features and the advantages of the dynamic features of the target fusion feature set are integrated, the program classification model is trained according to the feature values of the target fusion features in the sample program, and compared with the traditional technology in which the program classification model is trained only according to the static features or only according to the dynamic features, the method has a better training effect, that is, the program classification model has higher accuracy in classifying the programs of which the types are unknown.
In addition, in order to make the program classification model more accurately classify the class-unknown program, the program classification model can be obtained by training according to the feature value of the target fusion feature in each sample program, the feature value of the static feature of each sample program and the feature value of the dynamic feature of each sample program.
Referring to fig. 4, a flowchart of a program classification method provided in an embodiment of the present application is shown.
The program classification method provided by the embodiment of the application comprises the following steps:
s201: and acquiring the target program.
In the embodiment of the present application, the target program is a program whose category is unknown. The target program can be acquired in various ways, such as downloading from an open source website, and the like.
S202: and acquiring a characteristic value of each static characteristic and a characteristic value of each dynamic characteristic of the target program according to a preset static characteristic set comprising at least one static characteristic and a preset dynamic characteristic set comprising at least one dynamic characteristic.
In an embodiment of the present application, a preset static feature set and a preset dynamic feature set may be predetermined, where the preset static feature set includes at least one static feature, and the preset dynamic feature set includes at least one dynamic feature. The static characteristics are characteristics which embody the structural characteristics of the target program, and the dynamic characteristics are behavior characteristics which embody the target program in the running process. The preset static feature set may be the above-mentioned preset static feature set, and the preset dynamic feature set may be the above-mentioned preset dynamic feature set. The types and obtaining manners of the static features and the dynamic features may refer to the description of the static features and the dynamic features of the sample program in fig. 3, and are not described herein again.
S203: the method comprises the steps of obtaining a characteristic value of at least one target fusion characteristic of a target program, wherein the characteristic value of the at least one target fusion characteristic of the target program is obtained based on a corresponding fusion operation rule, and the fusion operation rule indicates that fusion operation is performed on the characteristic value of a specified static characteristic in a preset static characteristic set and the characteristic value of a specified dynamic characteristic in a preset dynamic characteristic set.
In the embodiment of the application, the feature value of the target fusion feature is obtained by executing fusion operation according to the feature value of the specified static feature in the preset static feature set and the feature value of the specified dynamic feature in the preset dynamic feature set. For a specific fusion operation, please refer to the description related to the fusion operation executed on the feature value of the specified static feature and the feature value of the specified dynamic feature of the sample program in fig. 3, which is not described herein again.
S204: and inputting the characteristic value of at least one target fusion characteristic of the target program into the program classification model to obtain a classification result of the target program.
According to the method and the device, the characteristic value of the target fusion characteristic obtained by performing fusion operation on the characteristic value of the specified static characteristic and the characteristic value of the specified dynamic characteristic of the target program is input into the program classification model, and not only the characteristic value of the static characteristic or only the characteristic value of the dynamic characteristic, so that the double advantages that the dynamic characteristic is utilized to identify the program with the changed form, and meanwhile, when the program is provided with the anti-virtual environment function, the static characteristic reflecting the structural characteristics of the program is utilized to identify the program are combined, and the accuracy of the program classification model for classifying the target program is effectively improved.
The technical solution provided by the embodiment of the present application is described below by taking an application scenario as an example, and completely describing a process from training of a program classification model to classification of a target program.
The program classification method provided by the embodiment of the application comprises the following steps:
s301: a plurality of sample programs of a pre-calibration category are obtained, and the sample programs comprise normal programs and malicious programs.
S302: s102 to S105 are executed on a plurality of sample programs to obtain a first program classification model.
The first program classification model is a binary classification model that can classify a target program as a normal program or a malicious program.
S303: and executing S102 to S105 on the malicious programs in the plurality of sample programs to obtain a second program classification model.
The malicious programs in the plurality of sample programs include a plurality of malicious program categories, so the second program classification model is a multi-classification model which can determine the target program as one of the plurality of malicious program categories.
S304: and acquiring the target program with unknown category.
S305: and executing S202 and S203 to the target program to obtain the characteristic value of the target fusion characteristic of the target program.
S306: and inputting the characteristic value of the target fusion characteristic of the target program into the first program classification model to obtain a classification result that the target program is a malicious program or a normal program.
S307: and when the classification result of the target program is the malicious program, inputting the characteristic value of the target fusion characteristic of the target program into the second program classification model to obtain the classification result of the target program which is a specific one of a plurality of malicious program classes.
S308: if the second program classification model cannot determine the class to which the target program belongs, that is, the target program does not belong to the multiple malicious program classes in the second program classification model, clustering a plurality of target programs according to the characteristic value of at least one target fusion characteristic of each target program in the plurality of target programs, and obtaining the respective classes of the plurality of target programs in a clustering manner.
There are various clustering algorithms, such as a noise-based density-based application space clustering (DBSCAN) algorithm, and the like, and the present application is not limited in particular.
S309: if the category of the target program is obtained through clustering, the category of the target program can be labeled, and the second program classification model is trained again, so that the new second program classification model can identify the program of the new category.
Correspondingly, referring to fig. 5, an embodiment of the present application further provides an apparatus for training a program classification model, where the apparatus includes:
a receiving unit 501, configured to receive a plurality of input sample programs, where a sample program refers to a program to which a category belongs that has been calibrated in advance, and the plurality of sample programs belong to at least two different categories;
a first processing unit 502, configured to select a sample program from the multiple sample programs, and perform the following processing to obtain a feature value of at least one candidate fusion feature of the selected sample program until each sample program of the multiple sample programs is processed:
acquiring a characteristic value of each static characteristic and a characteristic value of each dynamic characteristic of the selected sample program according to a preset static characteristic set comprising at least one static characteristic and a preset dynamic characteristic set comprising at least one dynamic characteristic, wherein the static characteristic reflects the structural characteristics of the selected sample program, and the dynamic characteristic reflects the behavior of the selected sample program in the operation process;
obtaining a feature value of at least one alternative fusion feature of the selected sample program according to the feature value of at least one static feature, the feature value of at least one dynamic feature and at least one fusion operation rule of the selected sample program, wherein the feature value of each alternative fusion feature of the at least one alternative fusion feature is obtained based on the corresponding fusion operation rule, and the fusion operation rule indicates that fusion operation is performed on the feature value of the specified static feature in the preset static feature set and the feature value of the specified dynamic feature in the preset dynamic feature set;
a second processing unit 503, configured to perform the following processing for a first candidate fusion feature of the at least one candidate fusion feature, and so on, to obtain an evaluation value of each candidate fusion feature: determining an evaluation value of the first candidate fusion feature according to the feature value of the first candidate fusion feature in each sample program and the category of each sample program, wherein the size of the evaluation value represents the validity degree of the first candidate fusion feature for distinguishing the category to which the sample program belongs;
a selecting unit 504, configured to select, according to the evaluation value of each candidate fusion feature, a target fusion feature from the at least one candidate fusion feature, where an effectiveness degree of an evaluation value of the target fusion feature is greater than effectiveness degrees of evaluation values of other candidate fusion features in the at least one candidate fusion feature;
and the training unit 505 is configured to train to obtain a program classification model according to the feature value of the target fusion feature in each sample program.
The specific work flow of the apparatus shown in fig. 5 can be referred to the related description in the foregoing embodiment of the program classification model training method.
Referring to fig. 6, an embodiment of the present application further provides a program classifying device, where the device includes:
a program acquisition unit 601 for acquiring a target program;
a first feature value obtaining unit 602, configured to obtain a feature value of each static feature and a feature value of each dynamic feature of the target program according to a preset static feature set including at least one static feature and a preset dynamic feature set including at least one dynamic feature; the static characteristics are characteristics which embody the structural characteristics of the target program, and the dynamic characteristics are behavior characteristics which embody the target program in the running process;
a second feature value obtaining unit 603, configured to obtain a feature value of at least one target fusion feature of the target program, where the feature value of the at least one target fusion feature of the target program is obtained based on a corresponding fusion operation rule, and the fusion operation rule indicates that a fusion operation is performed on a feature value of a specified static feature in the preset static feature set and a feature value of a specified dynamic feature in the preset dynamic feature set;
the classifying unit 604 is configured to input a feature value of at least one target fusion feature of the target program into the program classification model, so as to obtain a classification result of the target program.
The specific work flow of the apparatus shown in fig. 6 can be referred to the related description in the foregoing embodiment of the program classification method.
Referring to fig. 7, an embodiment of the present application further provides a program classification model training apparatus, including:
processor 710, memory 720, and network interface 730, processor 710, memory 720, and network interface 730 are interconnected by a bus 740.
Memory 720 includes, but is not limited to, Random Access Memory (RAM), Read Only Memory (ROM), erasable programmable read only memory (EPROM or flash memory), or portable read only memory (CD-ROM).
The processor 710 may be one or more Central Processing Units (CPUs), and in the case that the processor 710 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.
The network Interface 730 may be a wired Interface, such as a Fiber Distributed Data Interface (FDDI) Interface or a Gigabit Ethernet (GE) Interface; the network interface 730 may also be a wireless interface.
The network interface 730 is used for receiving a plurality of input sample programs, the sample programs refer to programs of which the categories belong to which the programs are calibrated in advance, and the sample programs belong to at least two different categories.
A memory 720 for storing program code;
a processor 710 for reading the program code stored in the memory 720, performing the following operations:
selecting a sample program from the plurality of sample programs, and performing the following processing to obtain a feature value of at least one candidate fusion feature of the selected sample program until each sample program in the plurality of sample programs is processed:
acquiring a characteristic value of each static characteristic and a characteristic value of each dynamic characteristic of the selected sample program according to a preset static characteristic set comprising at least one static characteristic and a preset dynamic characteristic set comprising at least one dynamic characteristic, wherein the static characteristic reflects the structural characteristics of the selected sample program, and the dynamic characteristic reflects the behavior of the selected sample program in the operation process;
obtaining a feature value of at least one alternative fusion feature of the selected sample program according to the feature value of at least one static feature, the feature value of at least one dynamic feature and at least one fusion operation rule of the selected sample program, wherein the feature value of each alternative fusion feature of the at least one alternative fusion feature is obtained based on the corresponding fusion operation rule, and the fusion operation rule indicates that fusion operation is performed on the feature value of the specified static feature in the preset static feature set and the feature value of the specified dynamic feature in the preset dynamic feature set;
for a first candidate fusion feature in the at least one candidate fusion feature, performing the following processing, and so on, to obtain an evaluation value of each candidate fusion feature: determining an evaluation value of the first candidate fusion feature according to the feature value of the first candidate fusion feature in each sample program and the category of each sample program, wherein the size of the evaluation value represents the validity degree of the first candidate fusion feature for distinguishing the category to which the sample program belongs;
selecting a target fusion feature from at least one candidate fusion feature according to the evaluation value of each candidate fusion feature, wherein the validity degree of the evaluation value of the target fusion feature is greater than the validity degree of the evaluation values of other candidate fusion features in the at least one candidate fusion feature;
and training to obtain a program classification model according to the characteristic value of the target fusion characteristic in each sample program.
The implementation of the device shown in fig. 7 can be seen in the relevant description in fig. 3.
Referring to fig. 8, an embodiment of the present application further provides a program classification device, including:
a processor 810, a memory 820, and a network interface 830, the processor 810, the memory 820, and the network interface 830 being interconnected by a bus 840.
The memory 820 includes, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), an erasable programmable read only memory (EPROM or flash memory), or a portable read only memory (CD-ROM).
The processor 810 may be one or more Central Processing Units (CPUs), and in the case that the processor 810 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.
The network Interface 830 may be a wired Interface, such as a Fiber Distributed Data Interface (FDDI) Interface or a Gigabit Ethernet (GE) Interface; the network interface 830 may also be a wireless interface.
A network interface 830 for acquiring a target program;
a memory 820 for storing program code;
a processor 810 for reading the program code stored in the memory 820 and performing the following operations:
acquiring a characteristic value of each static characteristic and a characteristic value of each dynamic characteristic of the target program according to a preset static characteristic set comprising at least one static characteristic and a preset dynamic characteristic set comprising at least one dynamic characteristic; the static characteristics are characteristics which embody the structural characteristics of the target program, and the dynamic characteristics are behavior characteristics which embody the target program in the running process;
acquiring a characteristic value of at least one target fusion characteristic of a target program, wherein the characteristic value of the at least one target fusion characteristic of the target program is obtained based on a corresponding fusion operation rule, and the fusion operation rule indicates that fusion operation is performed on the characteristic value of a specified static characteristic in a preset static characteristic set and the characteristic value of a specified dynamic characteristic in a preset dynamic characteristic set;
and inputting the characteristic value of at least one target fusion characteristic of the target program into the program classification model to obtain a classification result of the target program.
The implementation of the device shown in fig. 8 can be seen in the relevant description in fig. 4.
Embodiments of the present application also provide a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform the above method for training a program classification model.
Embodiments of the present application also provide a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform the above program classification method.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

Claims (27)

1.一种程序分类模型的训练方法,其特征在于,所述方法包括:1. a training method for a program classification model, wherein the method comprises: 接收输入的多个样本程序,所述样本程序是指所属的类别已被预先标定的程序,所述多个样本程序属于至少两个不同类别;receiving input multiple sample programs, the sample programs refer to programs whose categories have been pre-calibrated, and the multiple sample programs belong to at least two different categories; 从所述多个样本程序中选择出一个样本程序,执行以下处理从而得到选择出的样本程序的至少一个备选融合特征的特征值,直到处理完所述多个样本程序中的每个样本程序为止:One sample program is selected from the plurality of sample programs, and the following processing is performed to obtain the feature value of at least one candidate fusion feature of the selected sample program, until each sample program in the plurality of sample programs is processed until: 依据包括至少一个静态特征的预设静态特征集、以及包括至少一个动态特征的预设动态特征集,获取选择出的样本程序的每个所述静态特征的特征值和每个所述动态特征的特征值,所述静态特征反映所述选择出的样本程序的结构特点,所述动态特征反映所述选择出的样本程序在运行过程中体现的行为;According to the preset static feature set including at least one static feature and the preset dynamic feature set including at least one dynamic feature, obtain the feature value of each static feature and the characteristic value of each dynamic feature of the selected sample program. feature value, the static feature reflects the structural feature of the selected sample program, and the dynamic feature reflects the behavior of the selected sample program in the running process; 根据所述选择出的样本程序的至少一个静态特征的特征值、至少一个动态特征的特征值以及至少一个融合操作规则,获得所述选择出的样本程序的至少一个备选融合特征的特征值,所述至少一个备选融合特征中的每个备选融合特征的特征值是基于对应的融合操作规则得到的,所述融合操作规则指示对所述预设静态特征集中的指定静态特征的特征值和所述预设动态特征集中的指定动态特征的特征值执行融合操作;obtaining the feature value of at least one candidate fusion feature of the selected sample program according to the feature value of at least one static feature, the feature value of at least one dynamic feature and the at least one fusion operation rule of the selected sample program, The feature value of each candidate fusion feature in the at least one candidate fusion feature is obtained based on the corresponding fusion operation rule, and the fusion operation rule indicates the feature value of the specified static feature in the preset static feature set. Perform a fusion operation with the eigenvalues of the specified dynamic features in the preset dynamic feature set; 针对所述至少一个备选融合特征中的第一备选融合特征,执行以下处理,以此类推,从而得到每个备选融合特征的评价值:根据所述第一备选融合特征在每个样本程序中的特征值以及每个样本程序的类别,确定所述第一备选融合特征的评价值,所述评价值的大小体现所述第一备选融合特征用于区分样本程序所属类别的有效程度;For the first candidate fusion feature in the at least one candidate fusion feature, the following processing is performed, and so on, so as to obtain the evaluation value of each candidate fusion feature: according to the first candidate fusion feature, in each candidate fusion feature The feature value in the sample program and the category of each sample program determine the evaluation value of the first candidate fusion feature, and the size of the evaluation value reflects the effectiveness of the first candidate fusion feature for distinguishing the category to which the sample program belongs. degree; 根据所述每个备选融合特征的评价值,从所述至少一个备选融合特征中选择目标融合特征,所述目标融合特征的评价值体现的有效程度大于所述至少一个备选融合特征中的其他备选融合特征的评价值体现的有效程度;According to the evaluation value of each candidate fusion feature, a target fusion feature is selected from the at least one candidate fusion feature, and the evaluation value of the target fusion feature reflects a degree of effectiveness greater than that in the at least one candidate fusion feature The degree of effectiveness reflected by the evaluation values of other alternative fusion features; 根据所述每个样本程序中所述目标融合特征的特征值,训练得到程序分类模型。According to the feature value of the target fusion feature in each sample program, a program classification model is obtained by training. 2.根据权利要求1所述的方法,其特征在于,所述根据所述第一备选融合特征在每个样本程序中的特征值以及每个样本程序的类别,确定所述第一备选融合特征的评价值包括:2. The method according to claim 1, wherein the first candidate is determined according to the feature value of the first candidate fusion feature in each sample program and the category of each sample program The evaluation values of fusion features include: 按照样本程序所属的类别,统计每个类别的样本程序中所述第一备选融合特征的特征值,从而得到所述第一备选融合特征在各个类别的统计值;According to the category to which the sample program belongs, count the eigenvalues of the first candidate fusion feature in the sample programs of each category, so as to obtain the statistical value of the first candidate fusion feature in each category; 根据所述第一备选融合特征在各个类别的统计值,确定所述第一备选融合特征的评价值。The evaluation value of the first candidate fusion feature is determined according to the statistical value of the first candidate fusion feature in each category. 3.根据权利要求2所述的方法,其特征在于,所述统计值包括以下其中一种或多种:3. The method according to claim 2, wherein the statistical value comprises one or more of the following: 所述第一备选融合特征的特征值的中位数、均值和方差。The median, mean and variance of the eigenvalues of the first candidate fusion feature. 4.根据权利要求1-3任一项所述的方法,其特征在于,所述第一备选融合特征的特征值是基于对应的第一融合操作规则得到的;4. The method according to any one of claims 1-3, wherein the feature value of the first candidate fusion feature is obtained based on the corresponding first fusion operation rule; 所述每个融合操作规则指示对所述预设静态特征集中的指定静态特征的特征值和所述预设动态特征集中的指定动态特征的特征值执行融合操作,包括:Each of the fusion operation rules indicates that a fusion operation is performed on the feature values of the specified static features in the preset static feature set and the feature values of the specified dynamic features in the preset dynamic feature set, including: 所述第一融合操作规则指示对所述预设静态特征集中的第一静态特征的特征值和所述预设动态特征集中的第一动态特征的特征值执行数学运算。The first fusion operation rule instructs to perform a mathematical operation on the feature value of the first static feature in the preset static feature set and the feature value of the first dynamic feature in the preset dynamic feature set. 5.根据权利要求1-3任一项所述的方法,其特征在于,所述至少一个备选融合特征包括第二备选融合特征,所述第二备选融合特征的特征值是基于对应的第二融合操作规则得到的;5. The method according to any one of claims 1-3, wherein the at least one candidate fusion feature comprises a second candidate fusion feature, and the feature value of the second candidate fusion feature is based on the corresponding obtained from the second fusion operation rule of ; 所述每个融合操作规则指示对所述预设静态特征集中的指定静态特征的特征值和所述预设动态特征集中的指定动态特征的特征值执行融合操作,包括:Each of the fusion operation rules indicates that a fusion operation is performed on the feature values of the specified static features in the preset static feature set and the feature values of the specified dynamic features in the preset dynamic feature set, including: 所述第二融合操作规则指示对所述预设静态特征集中的第二静态特征的特征值和所述预设动态特征集中的第二动态特征的特征值执行逻辑操作。The second fusion operation rule instructs to perform a logical operation on the feature value of the second static feature in the preset static feature set and the feature value of the second dynamic feature in the preset dynamic feature set. 6.根据权利要求1-3任一项所述的方法,其特征在于,所述至少一个备选融合特征包括第三备选融合特征,所述第三备选融合特征的特征值是基于对应的第三融合操作规则得到的;6. The method according to any one of claims 1-3, wherein the at least one candidate fusion feature comprises a third candidate fusion feature, and the feature value of the third candidate fusion feature is based on the corresponding obtained from the third fusion operation rule; 所述每个融合操作规则指示对所述预设静态特征集中的指定静态特征的特征值和所述预设动态特征集中的指定动态特征的特征值执行融合操作,包括:Each of the fusion operation rules indicates that a fusion operation is performed on the feature values of the specified static features in the preset static feature set and the feature values of the specified dynamic features in the preset dynamic feature set, including: 所述第三融合操作指示从所述预设静态特征集和所述预设动态特征集中确定特征本身相同、且特征值相同的特征,并根据所述特征本身相同且特征值相同的特征的总数目计算所述第三备选融合特征的特征值。The third fusion operation instructs to determine, from the preset static feature set and the preset dynamic feature set, features with the same feature itself and the same feature value, and according to the total number of features with the same feature itself and the same feature value The purpose is to calculate the eigenvalue of the third candidate fusion feature. 7.根据权利要求6所述的方法,其特征在于,所述根据所述特征本身相同且特征值相同的特征的总数目计算所述第三备选融合特征的特征值包括:7. The method according to claim 6, wherein calculating the feature value of the third candidate fusion feature according to the total number of features with the same feature itself and the same feature value comprises: 确定第一数值和第二数值中的最大值,所述第一数值为所述预设静态特征集中包含的静态特征的总数目,所述第二数值为所述预设动态特征集中包含的动态特征的总数目;Determine the maximum value among the first numerical value and the second numerical value, the first numerical value is the total number of static features included in the preset static feature set, and the second numerical value is the dynamic feature set included in the preset dynamic feature set. the total number of features; 计算所述特征本身相同且特征值相同的特征的总数目与所述最大值之间的比值,将所述比值作为所述第三备选融合特征的特征值。Calculate the ratio between the total number of features with the same features and the same feature value and the maximum value, and use the ratio as the feature value of the third candidate fusion feature. 8.根据权利要求1-7任一项所述的方法,其特征在于,所述根据所述每个样本程序中所述目标融合特征的特征值,训练得到程序分类模型包括:8. The method according to any one of claims 1-7, characterized in that, according to the feature value of the target fusion feature in each sample program, the program classification model obtained by training comprises: 根据所述每个样本程序中所述目标融合特征的特征值、所述每个样本程序的至少一个静态特征的特征值以及所述每个样本程序的至少一个动态特征的特征值,训练得到程序分类模型。According to the feature value of the target fusion feature in each sample program, the feature value of at least one static feature of each sample program, and the feature value of at least one dynamic feature of each sample program, a program is obtained by training classification model. 9.根据权利要求1-8任一项所述的方法,其特征在于,所述至少一个动态特征包括所述样本程序的参数模型和/或所述样本程序在运行过程中所调用的至少一个接口,所述参数模型根据所述样本程序在运行过程中所使用的参数提取得到。9. The method according to any one of claims 1-8, wherein the at least one dynamic feature comprises a parameter model of the sample program and/or at least one called by the sample program during running interface, the parameter model is extracted according to the parameters used in the running process of the sample program. 10.根据权利要求9所述的方法,其特征在于,所述至少一个动态特征包括第三动态特征;10. The method of claim 9, wherein the at least one dynamic characteristic comprises a third dynamic characteristic; 所述选择出的样本程序的第三动态特征的特征值为所述第三动态特征的频率,所述第三动态特征的频率为第三动态特征在选择出的样本程序中出现的次数与所述预设动态特征集包括的动态特征的总数目之间的比值。The feature value of the third dynamic feature of the selected sample program is the frequency of the third dynamic feature, and the frequency of the third dynamic feature is the number of occurrences of the third dynamic feature in the selected sample program and the frequency of the third dynamic feature. The ratio between the total number of dynamic features included in the preset dynamic feature set. 11.一种程序分类方法,其特征在于,所述方法包括:11. A program classification method, wherein the method comprises: 获取目标程序;get the target program; 依据包括至少一个静态特征的预设静态特征集、以及包括至少一个动态特征的预设动态特征集,获取所述目标程序的每个所述静态特征的特征值和每个所述动态特征的特征值;所述静态特征为体现所述目标程序的结构特点的特征,所述动态特征为所述目标程序在运行过程中体现的行为特征;According to a preset static feature set including at least one static feature and a preset dynamic feature set including at least one dynamic feature, acquiring the feature value of each of the static features and the feature of each of the dynamic features of the target program value; the static feature is the feature that embodies the structural feature of the target program, and the dynamic feature is the behavior feature that the target program embodies in the running process; 获取所述目标程序的至少一个目标融合特征的特征值,所述目标程序的至少一个目标融合特征的特征值是基于对应的融合操作规则得到的,所述融合操作规则指示对所述预设静态特征集中指定静态特征的特征值和所述预设动态特征集中指定动态特征的特征值执行融合操作;Obtain the feature value of at least one target fusion feature of the target program, the feature value of at least one target fusion feature of the target program is obtained based on the corresponding fusion operation rule, and the fusion operation rule indicates that the preset static The eigenvalues of the specified static features in the feature set and the eigenvalues of the specified dynamic features in the preset dynamic feature set perform a fusion operation; 将所述目标程序的至少一个目标融合特征的特征值输入程序分类模型,得到对所述目标程序的分类结果。The feature value of at least one target fusion feature of the target program is input into a program classification model to obtain a classification result for the target program. 12.根据权利要求11所述的方法,其特征在于,所述至少一个目标融合特征包括第一目标融合特征,所述第一目标融合特征的特征值是基于对应的第一融合操作规则得到的;12. The method according to claim 11, wherein the at least one target fusion feature comprises a first target fusion feature, and the feature value of the first target fusion feature is obtained based on a corresponding first fusion operation rule ; 所述融合操作规则指示对所述预设静态特征集中指定静态特征的特征值和所述预设动态特征集中指定动态特征的特征值执行融合操作,包括:The fusion operation rule instructs to perform a fusion operation on the eigenvalues of the specified static features in the preset static feature set and the eigenvalues of the specified dynamic features in the preset dynamic feature set, including: 所述第一融合操作规则指示对所述预设静态特征集中的第一静态特征的特征值和所述预设动态特征集中的第一动态特征的特征值执行数学运算。The first fusion operation rule instructs to perform a mathematical operation on the feature value of the first static feature in the preset static feature set and the feature value of the first dynamic feature in the preset dynamic feature set. 13.根据权利要求11所述的方法,其特征在于,所述至少一个目标融合特征包括第二目标融合特征,所述第二目标融合特征的特征值是基于对应的第二融合操作规则得到的;13. The method according to claim 11, wherein the at least one target fusion feature comprises a second target fusion feature, and the feature value of the second target fusion feature is obtained based on a corresponding second fusion operation rule ; 所述融合操作规则指示对所述预设静态特征集中指定静态特征的特征值和所述预设动态特征集中指定动态特征的特征值执行融合操作,包括:The fusion operation rule instructs to perform a fusion operation on the eigenvalues of the specified static features in the preset static feature set and the eigenvalues of the specified dynamic features in the preset dynamic feature set, including: 所述第二融合操作规则指示对所述预设静态特征集中的第二静态特征的特征值和所述预设动态特征集中的第二动态特征的特征值执行逻辑操作。The second fusion operation rule instructs to perform a logical operation on the feature value of the second static feature in the preset static feature set and the feature value of the second dynamic feature in the preset dynamic feature set. 14.根据权利要求11所述的方法,其特征在于,所述至少一个目标融合特征包括第三目标融合特征,所述第三目标融合特征的特征值是基于对应的第三融合操作规则得到的;14. The method according to claim 11, wherein the at least one target fusion feature comprises a third target fusion feature, and the feature value of the third target fusion feature is obtained based on a corresponding third fusion operation rule ; 所述融合操作规则指示对所述预设静态特征集中指定静态特征的特征值和所述预设动态特征集中指定动态特征的特征值执行融合操作,包括:The fusion operation rule instructs to perform a fusion operation on the eigenvalues of the specified static features in the preset static feature set and the eigenvalues of the specified dynamic features in the preset dynamic feature set, including: 所述第三融合操作指示从所述预设静态特征集和所述预设动态特征集中确定特征本身相同且特征值相同的特征,并根据所述特征本身相同且特征值相同的特征的总数目计算所述第三目标融合特征的特征值。The third fusion operation instructs to determine, from the preset static feature set and the preset dynamic feature set, features with the same features and the same feature values, and according to the total number of features with the same features and the same feature values Calculate the feature value of the third target fusion feature. 15.根据权利要求14所述的方法,其特征在于,所述根据所述特征本身相同且特征值相同的特征的总数目计算所述第三目标融合特征的特征值,包括:15. The method according to claim 14, wherein the calculating the feature value of the third target fusion feature according to the total number of features with the same feature itself and the same feature value, comprising: 确定第一数值和第二数值中的最大值,所述第一数值为所述预设静态特征集中包含的静态特征的总数目,所述第二数值为所述预设动态特征集中包含的动态特征的总数目;Determine the maximum value among the first numerical value and the second numerical value, the first numerical value is the total number of static features included in the preset static feature set, and the second numerical value is the dynamic feature set included in the preset dynamic feature set. the total number of features; 计算所述特征本身相同且特征值相同的特征的总数目与所述最大值之间的比值,将所述比值作为所述第三目标融合特征的特征值。Calculate the ratio between the total number of features with the same features and the same feature value and the maximum value, and use the ratio as the feature value of the third target fusion feature. 16.根据权利要求11-15任一项所述的方法,其特征在于,所述目标程序的至少一个动态特征的包括:参数模型和/或预设接口。16. The method according to any one of claims 11-15, wherein at least one dynamic feature of the target program includes: a parametric model and/or a preset interface. 17.根据权利要求16所述的方法,其特征在于,若所述目标程序的至少一个动态特征包括参数模型和预设接口,则所述获取所述目标程序的动态特征的特征值包括:17. The method according to claim 16, wherein, if at least one dynamic feature of the target program includes a parametric model and a preset interface, the acquiring the characteristic value of the dynamic feature of the target program comprises: 获取所述目标程序在运行过程中所调用的预设接口以及所使用的参数;Obtain the preset interface called by the target program during the running process and the parameters used; 根据所述所使用的参数提取所述参数的参数模型;extracting a parametric model of the parameters according to the used parameters; 从所述目标程序的至少一个动态特征中选择出第三动态特征,将所述第三动态特征的频率作为所述第三动态特征的特征值,以此类推,从而得到所述目标程序的所有动态特征的特征值,所述第三动态特征的频率为所述第三动态特征在选择出的样本程序中出现的次数与所有预设动态特征集包括的动态特征的总数目之间的比值。A third dynamic feature is selected from at least one dynamic feature of the target program, and the frequency of the third dynamic feature is taken as the eigenvalue of the third dynamic feature, and so on, so as to obtain all the features of the target program. The feature value of the dynamic feature, the frequency of the third dynamic feature is the ratio between the number of times the third dynamic feature appears in the selected sample program and the total number of dynamic features included in all preset dynamic feature sets. 18.根据权利要求11-17任一项所述的方法,其特征在于,所述目标程序为多个,18. The method according to any one of claims 11-17, wherein the target program is multiple, 所述方法还包括:The method also includes: 根据多个所述目标程序中每个所述目标程序的至少一个目标融合特征的特征值,对多个所述目标程序进行聚类,得到每个所述目标程序的类别。According to the feature value of at least one target fusion feature of each of the plurality of target programs, the plurality of target programs are clustered to obtain the category of each target program. 19.一种程序分类模型的训练装置,其特征在于,所述装置包括:19. A training device for a program classification model, wherein the device comprises: 接收单元,用于接收输入的多个样本程序,所述样本程序是指所属的类别已被预先标定的程序,所述多个样本程序属于至少两个不同类别;a receiving unit, configured to receive input multiple sample programs, the sample programs refer to programs whose categories have been pre-calibrated, and the multiple sample programs belong to at least two different categories; 第一处理单元,用于从所述多个样本程序中选择出一个样本程序,执行以下处理从而得到选择出的样本程序的至少一个备选融合特征的特征值,直到处理完所述多个样本程序中的每个样本程序为止:a first processing unit, configured to select a sample program from the plurality of sample programs, and perform the following processing to obtain a feature value of at least one candidate fusion feature of the selected sample program, until the plurality of samples are processed Each sample program in the program so far: 依据包括至少一个静态特征的预设静态特征集、以及包括至少一个动态特征的预设动态特征集,获取选择出的样本程序的每个所述静态特征的特征值和每个所述动态特征的特征值,所述静态特征反映所述选择出的样本程序的结构特点,所述动态特征反映所述选择出的样本程序在运行过程中体现的行为;According to the preset static feature set including at least one static feature and the preset dynamic feature set including at least one dynamic feature, obtain the feature value of each static feature and the characteristic value of each dynamic feature of the selected sample program. feature value, the static feature reflects the structural feature of the selected sample program, and the dynamic feature reflects the behavior of the selected sample program in the running process; 根据所述选择出的样本程序的至少一个静态特征的特征值、至少一个动态特征的特征值以及至少一个融合操作规则,获得所述选择出的样本程序的至少一个备选融合特征的特征值,所述至少一个备选融合特征中的每个备选融合特征的特征值是基于对应的融合操作规则得到的,所述融合操作规则指示对所述预设静态特征集中的指定静态特征的特征值和所述预设动态特征集中的指定动态特征的特征值执行融合操作;obtaining the feature value of at least one candidate fusion feature of the selected sample program according to the feature value of at least one static feature, the feature value of at least one dynamic feature and the at least one fusion operation rule of the selected sample program, The feature value of each candidate fusion feature in the at least one candidate fusion feature is obtained based on the corresponding fusion operation rule, and the fusion operation rule indicates the feature value of the specified static feature in the preset static feature set. Perform a fusion operation with the eigenvalues of the specified dynamic features in the preset dynamic feature set; 第二处理单元,用于针对所述至少一个备选融合特征中的第一备选融合特征,执行以下处理,以此类推,从而得到每个备选融合特征的评价值:根据所述第一备选融合特征在每个样本程序中的特征值以及每个样本程序的类别,确定所述第一备选融合特征的评价值,所述评价值的大小体现所述第一备选融合特征用于区分样本程序所属类别的有效程度;The second processing unit is configured to perform the following processing for the first candidate fusion feature in the at least one candidate fusion feature, and so on, so as to obtain the evaluation value of each candidate fusion feature: according to the first candidate fusion feature The feature value of the candidate fusion feature in each sample program and the category of each sample program determine the evaluation value of the first candidate fusion feature, and the size of the evaluation value reflects that the first candidate fusion feature is used for the degree of effectiveness in distinguishing the categories to which the sample programs belong; 选择单元,用于根据所述每个备选融合特征的评价值,从所述至少一个备选融合特征中选择目标融合特征,所述目标融合特征的评价值体现的有效程度大于所述至少一个备选融合特征中的其他备选融合特征的评价值体现的有效程度;A selection unit, configured to select a target fusion feature from the at least one candidate fusion feature according to the evaluation value of each candidate fusion feature, and the evaluation value of the target fusion feature reflects a degree of effectiveness greater than that of the at least one The degree of effectiveness reflected by the evaluation values of other candidate fusion features in the candidate fusion features; 训练单元,用于根据所述每个样本程序中所述目标融合特征的特征值,训练得到程序分类模型。A training unit, configured to train a program classification model according to the feature value of the target fusion feature in each sample program. 20.根据权利要求19所述的装置,其特征在于,所述根据所述第一备选融合特征在每个样本程序中的特征值以及每个样本程序的类别,确定所述第一备选融合特征的评价值包括:20. The apparatus according to claim 19, wherein the first candidate is determined according to the feature value of the first candidate fusion feature in each sample program and the category of each sample program The evaluation values of fusion features include: 按照样本程序所属的类别,统计每个类别的样本程序中所述第一备选融合特征的特征值,从而得到所述第一备选融合特征在各个类别的统计值;根据所述第一备选融合特征在各个类别的统计值,确定所述第一备选融合特征的评价值。According to the category to which the sample program belongs, count the eigenvalues of the first candidate fusion features in the sample programs of each category, so as to obtain the statistical values of the first candidate fusion features in each category; The statistical values of the fusion features in each category are selected to determine the evaluation value of the first candidate fusion feature. 21.根据权利要求19或20所述的装置,其特征在于,所述第一备选融合特征的特征值是基于对应的第一融合操作规则得到的;21. The apparatus according to claim 19 or 20, wherein the feature value of the first candidate fusion feature is obtained based on the corresponding first fusion operation rule; 所述每个融合操作规则指示对所述预设静态特征集中的指定静态特征的特征值和所述预设动态特征集中的指定动态特征的特征值执行融合操作包括:The instruction of each fusion operation rule to perform a fusion operation on the eigenvalues of the specified static features in the preset static feature set and the eigenvalues of the specified dynamic features in the preset dynamic feature set includes: 所述第一融合操作规则指示对所述预设静态特征集中的第一静态特征的特征值和所述预设动态特征集中的第一动态特征的特征值执行数学运算。The first fusion operation rule instructs to perform a mathematical operation on the feature value of the first static feature in the preset static feature set and the feature value of the first dynamic feature in the preset dynamic feature set. 22.根据权利要求19或20所述的装置,其特征在于,所述至少一个备选融合特征包括第二备选融合特征,所述第二备选融合特征的特征值是基于对应的第二融合操作规则得到的;22. The apparatus according to claim 19 or 20, wherein the at least one candidate fusion feature comprises a second candidate fusion feature, and the feature value of the second candidate fusion feature is based on a corresponding second fusion feature. obtained by integrating the operating rules; 所述每个融合操作规则指示对所述预设静态特征集中的指定静态特征的特征值和所述预设动态特征集中的指定动态特征的特征值执行融合操作包括:The instruction of each fusion operation rule to perform a fusion operation on the eigenvalues of the specified static features in the preset static feature set and the eigenvalues of the specified dynamic features in the preset dynamic feature set includes: 所述第二融合操作规则指示对所述预设静态特征集中的第二静态特征的特征值和所述预设动态特征集中的第二动态特征的特征值执行逻辑操作。The second fusion operation rule instructs to perform a logical operation on the feature value of the second static feature in the preset static feature set and the feature value of the second dynamic feature in the preset dynamic feature set. 23.根据权利要求19或20所述的装置,其特征在于,所述至少一个备选融合特征包括第三备选融合特征,所述第三备选融合特征的特征值是基于对应的第三融合操作规则得到的;23. The apparatus according to claim 19 or 20, wherein the at least one candidate fusion feature comprises a third candidate fusion feature, and the feature value of the third candidate fusion feature is based on the corresponding third obtained by integrating the operating rules; 所述每个融合操作规则指示对所述预设静态特征集中的指定静态特征的特征值和所述预设动态特征集中的指定动态特征的特征值执行融合操作包括:The instruction of each fusion operation rule to perform a fusion operation on the eigenvalues of the specified static features in the preset static feature set and the eigenvalues of the specified dynamic features in the preset dynamic feature set includes: 所述第三融合操作指示从所述预设静态特征集和所述预设动态特征集中确定特征本身相同、且特征值相同的特征,并根据所述特征本身相同且特征值相同的特征的总数目计算所述第三备选融合特征的特征值。The third fusion operation instructs to determine, from the preset static feature set and the preset dynamic feature set, features with the same feature itself and the same feature value, and according to the total number of features with the same feature itself and the same feature value The purpose is to calculate the eigenvalue of the third candidate fusion feature. 24.一种程序分类装置,其特征在于,所述装置包括:24. A program classification device, characterized in that the device comprises: 程序获取单元,用于获取目标程序;The program acquisition unit is used to acquire the target program; 第一特征值获取单元,用于依据包括至少一个静态特征的预设静态特征集、以及包括至少一个动态特征的预设动态特征集,获取所述目标程序的每个所述静态特征的特征值和每个所述动态特征的特征值;所述静态特征为体现所述目标程序的结构特点的特征,所述动态特征为所述目标程序在运行过程中体现的行为特征;a first feature value obtaining unit, configured to obtain a feature value of each of the static features of the target program according to a preset static feature set including at least one static feature and a preset dynamic feature set including at least one dynamic feature And the characteristic value of each described dynamic characteristic; Described static characteristic is the characteristic that embodies the structural characteristic of described target program, and described dynamic characteristic is the behavioral characteristic that described target program embodies in the running process; 第二特征值获取单元,用于获取所述目标程序的至少一个目标融合特征的特征值,所述目标程序的至少一个目标融合特征的特征值是基于对应的融合操作规则得到的,所述融合操作规则指示对所述预设静态特征集中指定静态特征的特征值和所述预设动态特征集中指定动态特征的特征值执行融合操作;The second feature value obtaining unit is configured to obtain the feature value of at least one target fusion feature of the target program, and the feature value of at least one target fusion feature of the target program is obtained based on the corresponding fusion operation rule. The operation rule instructs to perform a fusion operation on the feature value of the specified static feature in the preset static feature set and the feature value of the specified dynamic feature in the preset dynamic feature set; 分类单元,用于将所述目标程序的至少一个目标融合特征的特征值输入程序分类模型,得到对所述目标程序的分类结果。The classification unit is configured to input the feature value of at least one target fusion feature of the target program into a program classification model to obtain a classification result of the target program. 25.根据权利要求24所述的装置,其特征在于,所述至少一个目标融合特征包括第一目标融合特征,所述第一目标融合特征的特征值是基于对应的第一融合操作规则得到的;25. The apparatus according to claim 24, wherein the at least one target fusion feature comprises a first target fusion feature, and the feature value of the first target fusion feature is obtained based on a corresponding first fusion operation rule ; 所述融合操作规则指示对所述预设静态特征集中指定静态特征的特征值和所述预设动态特征集中指定动态特征的特征值执行融合操作包括:The fusion operation rule instructing to perform a fusion operation on the feature value of the specified static feature in the preset static feature set and the feature value of the specified dynamic feature in the preset dynamic feature set includes: 所述第一融合操作规则指示对所述预设静态特征集中的第一静态特征的特征值和所述预设动态特征集中的第一动态特征的特征值执行数学运算。The first fusion operation rule instructs to perform a mathematical operation on the feature value of the first static feature in the preset static feature set and the feature value of the first dynamic feature in the preset dynamic feature set. 26.根据权利要求24所述的装置,其特征在于,所述至少一个目标融合特征包括第二目标融合特征,所述第二目标融合特征的特征值是基于对应的第二融合操作规则得到的;26. The apparatus according to claim 24, wherein the at least one target fusion feature comprises a second target fusion feature, and the feature value of the second target fusion feature is obtained based on a corresponding second fusion operation rule ; 所述融合操作规则指示对所述预设静态特征集中指定静态特征的特征值和所述预设动态特征集中指定动态特征的特征值执行融合操作包括:The fusion operation rule instructing to perform a fusion operation on the feature value of the specified static feature in the preset static feature set and the feature value of the specified dynamic feature in the preset dynamic feature set includes: 所述第二融合操作规则指示对所述预设静态特征集中的第二静态特征的特征值和所述预设动态特征集中的第二动态特征的特征值执行逻辑操作。The second fusion operation rule instructs to perform a logical operation on the feature value of the second static feature in the preset static feature set and the feature value of the second dynamic feature in the preset dynamic feature set. 27.根据权利要求24所述的装置,其特征在于,所述至少一个目标融合特征包括第三目标融合特征,所述第三目标融合特征的特征值是基于对应的第三融合操作规则得到的;27. The apparatus according to claim 24, wherein the at least one target fusion feature comprises a third target fusion feature, and the feature value of the third target fusion feature is obtained based on a corresponding third fusion operation rule ; 所述融合操作规则指示对所述预设静态特征集中指定静态特征的特征值和所述预设动态特征集中指定动态特征的特征值执行融合操作包括:The fusion operation rule instructing to perform a fusion operation on the feature value of the specified static feature in the preset static feature set and the feature value of the specified dynamic feature in the preset dynamic feature set includes: 所述第三融合操作指示从所述预设静态特征集和所述预设动态特征集中确定特征本身相同且特征值相同的特征,并根据所述特征本身相同且特征值相同的特征的总数目计算所述第三目标融合特征的特征值。The third fusion operation instructs to determine, from the preset static feature set and the preset dynamic feature set, features with the same features and the same feature values, and according to the total number of features with the same features and the same feature values Calculate the feature value of the third target fusion feature.
CN201811419260.7A 2018-11-26 2018-11-26 Program classification model training method, program classification method and device Withdrawn CN111222137A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811419260.7A CN111222137A (en) 2018-11-26 2018-11-26 Program classification model training method, program classification method and device
PCT/CN2019/119587 WO2020108357A1 (en) 2018-11-26 2019-11-20 Program classification model training method, program classification method, and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811419260.7A CN111222137A (en) 2018-11-26 2018-11-26 Program classification model training method, program classification method and device

Publications (1)

Publication Number Publication Date
CN111222137A true CN111222137A (en) 2020-06-02

Family

ID=70826987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811419260.7A Withdrawn CN111222137A (en) 2018-11-26 2018-11-26 Program classification model training method, program classification method and device

Country Status (2)

Country Link
CN (1) CN111222137A (en)
WO (1) WO2020108357A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723371A (en) * 2020-06-22 2020-09-29 上海斗象信息科技有限公司 Building a detection model for malicious files and a method for detecting malicious files
CN113435605A (en) * 2021-06-25 2021-09-24 烽火通信科技股份有限公司 Control method and device for AI dynamic injection based on network data pool
CN113837210A (en) * 2020-06-23 2021-12-24 腾讯科技(深圳)有限公司 Applet classifying method, device, equipment and computer readable storage medium
CN113988145A (en) * 2020-07-10 2022-01-28 华为技术有限公司 Application type identification method and device, terminal equipment and readable storage medium
CN117338263A (en) * 2023-12-04 2024-01-05 中国人民解放军总医院海南医院 A real-time safe monitoring method for body temperature and heart rate of wearable devices

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11507663B2 (en) 2014-08-11 2022-11-22 Sentinel Labs Israel Ltd. Method of remediating operations performed by a program and system thereof
US9710648B2 (en) 2014-08-11 2017-07-18 Sentinel Labs Israel Ltd. Method of malware detection and system thereof
US11695800B2 (en) 2016-12-19 2023-07-04 SentinelOne, Inc. Deceiving attackers accessing network data
US10462171B2 (en) 2017-08-08 2019-10-29 Sentinel Labs Israel Ltd. Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking
CN111680297B (en) * 2020-07-09 2025-06-24 腾讯科技(深圳)有限公司 Artificial intelligence-based script file detection method, device and electronic device
CN112036446B (en) * 2020-08-06 2023-12-12 汇纳科技股份有限公司 Method, system, medium and device for fusing target identification features
US11899782B1 (en) 2021-07-13 2024-02-13 SentinelOne, Inc. Preserving DLL hooks
US12452273B2 (en) 2022-03-30 2025-10-21 SentinelOne, Inc Systems, methods, and devices for preventing credential passing attacks
CN115127635A (en) * 2022-07-01 2022-09-30 广汽丰田汽车有限公司 Method for quantitatively detecting brake fluid filling, terminal device and storage medium
WO2024044559A1 (en) * 2022-08-22 2024-02-29 SentinelOne, Inc. Systems and methods of data selection for iterative training using zero knowledge clustering
CN116912725A (en) * 2022-12-27 2023-10-20 中移物联网有限公司 A target object detection method, device, electronic equipment and readable storage medium
WO2024152041A1 (en) 2023-01-13 2024-07-18 SentinelOne, Inc. Classifying cybersecurity threats using machine learning on non-euclidean data
CN117852032A (en) * 2023-12-05 2024-04-09 武汉纺织大学 Robust detection method for reinforcing android malicious applications based on mixed features
CN118410354B (en) * 2024-07-04 2024-10-01 北京安天网络安全技术有限公司 Method, device, equipment and medium for acquiring dynamic behaviors of samples
CN118865333B (en) * 2024-09-29 2024-12-13 成都赛力斯科技有限公司 Off-vehicle biological detection method and device, electronic equipment and readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810424B (en) * 2012-11-05 2017-02-08 腾讯科技(深圳)有限公司 Method and device for identifying abnormal application programs
CN107180191A (en) * 2017-05-03 2017-09-19 北京理工大学 A kind of malicious code analysis method and system based on semi-supervised learning
CN107180192B (en) * 2017-05-09 2020-05-29 北京理工大学 Android malicious application detection method and system based on multi-feature fusion
CN108345794A (en) * 2017-12-29 2018-07-31 北京物资学院 The detection method and device of Malware

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723371A (en) * 2020-06-22 2020-09-29 上海斗象信息科技有限公司 Building a detection model for malicious files and a method for detecting malicious files
CN111723371B (en) * 2020-06-22 2024-02-20 上海斗象信息科技有限公司 Build a detection model for malicious files and a method for detecting malicious files
CN113837210A (en) * 2020-06-23 2021-12-24 腾讯科技(深圳)有限公司 Applet classifying method, device, equipment and computer readable storage medium
WO2021258968A1 (en) * 2020-06-23 2021-12-30 腾讯科技(深圳)有限公司 Applet classification method, apparatus and device, and computer readable storage medium
US12229547B2 (en) 2020-06-23 2025-02-18 Tencent Technology (Shenzhen) Company Limited Miniprogram classification method, apparatus, and device, and computer-readable storage medium
CN113988145A (en) * 2020-07-10 2022-01-28 华为技术有限公司 Application type identification method and device, terminal equipment and readable storage medium
CN113435605A (en) * 2021-06-25 2021-09-24 烽火通信科技股份有限公司 Control method and device for AI dynamic injection based on network data pool
CN117338263A (en) * 2023-12-04 2024-01-05 中国人民解放军总医院海南医院 A real-time safe monitoring method for body temperature and heart rate of wearable devices
CN117338263B (en) * 2023-12-04 2024-02-09 中国人民解放军总医院海南医院 Real-time safety monitoring method for body temperature and heart rate of wearable equipment

Also Published As

Publication number Publication date
WO2020108357A1 (en) 2020-06-04

Similar Documents

Publication Publication Date Title
CN111222137A (en) Program classification model training method, program classification method and device
CN111382434B (en) System and method for detecting malicious files
RU2454714C1 (en) System and method of increasing efficiency of detecting unknown harmful objects
EP2955658B1 (en) System and methods for detecting harmful files of different formats
US9781144B1 (en) Determining duplicate objects for malware analysis using environmental/context information
JP6088713B2 (en) Vulnerability discovery device, vulnerability discovery method, and vulnerability discovery program
US8108931B1 (en) Method and apparatus for identifying invariants to detect software tampering
CN112602081A (en) Enhancing network security and operational monitoring with alarm confidence assignment
CN106997367B (en) Program file classification method, classification device and classification system
JP6697123B2 (en) Profile generation device, attack detection device, profile generation method, and profile generation program
US11163877B2 (en) Method, server, and computer storage medium for identifying virus-containing files
US11295013B2 (en) Dimensionality reduction based on functionality
US12524523B2 (en) Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program
CN111338692B (en) Vulnerability classification method, device and electronic device based on vulnerability code
CN114936366A (en) Malicious software family tag correction method and device based on hybrid analysis
O'Kane et al. N-gram density based malware detection
CN119397533A (en) Malicious script detection method, device, equipment and storage medium
Khan et al. Op2Vec: An Opcode Embedding Technique and Dataset Design for End‐to‐End Detection of Android Malware
Ognev et al. Clustering of malicious executable files based on the sequence analysis of system calls
WO2018177602A1 (en) Malware detection in applications based on presence of computer generated strings
CN114510713A (en) Method and device for detecting malicious software, electronic equipment and storage medium
Rowe Identifying forensically uninteresting files using a large corpus
US11222113B1 (en) Automatically generating malware definitions using word-level analysis
KR101645214B1 (en) Method and apparatus for malicious code classification
US11563717B2 (en) Generation method, generation device, and recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200602

WW01 Invention patent application withdrawn after publication