CN118586009A

CN118586009A - Code defect vulnerability scanning and determination method based on knowledge base

Info

Publication number: CN118586009A
Application number: CN202411072469.6A
Authority: CN
Inventors: 张世通; 赵亚舟; 陈梦晖; 郭鑫; 闫卫杰; 冯智; 魏满红
Original assignee: Beijing Keyware Co ltd
Current assignee: Beijing Keyware Co ltd
Priority date: 2024-08-06
Filing date: 2024-08-06
Publication date: 2024-09-03

Abstract

The present invention relates to the technical field of code detection, and in particular to a code defect vulnerability scanning and determination method based on a knowledge base, comprising: generating a sub-code segment; calculating an abnormal tendency evaluation value of a target code; analyzing whether the abnormal tendency of the target code is qualified; performing an aggressive test and marking; regenerating a new sub-code segment; and adjusting an attack frequency. In the present invention, the abnormal tendency of the target code is comprehensively analyzed based on the conditional probability and repetition between each sub-code segment, thereby improving the control accuracy for the structural complexity of the target code, performing an aggressive test on the target code with unqualified abnormal tendency, marking each sub-code segment according to the test result, and storing the sub-code segment with a vulnerability in a vulnerability database, so as to facilitate subsequent comparison with the code segment in the vulnerability database to determine whether the target code segment has a vulnerability, thereby improving the vulnerability identification efficiency of the code.

Description

Code defect vulnerability scanning and determination method based on knowledge base

技术领域Technical Field

本发明涉及代码检测技术领域，尤其涉及一种基于知识库的代码缺陷漏洞扫描确定方法。The present invention relates to the technical field of code detection, and in particular to a code defect vulnerability scanning and determination method based on a knowledge base.

背景技术Background Art

代码缺陷漏洞是一个非常重要的安全问题，需要采取全面的闭环管理措施来解决，使用自动化扫描工具、人工渗透测试等方式全面发现系统和应用程序中存在的安全漏洞。这是闭环管理的第一步，为后续的评估和修复奠定基础。现有技术通过机器自学习技术，基于历史代码审计信息，学习识别有效缺陷，增强了应用软件系统的安全性，但是针对代码漏洞的识别效率低。Code defect vulnerabilities are a very important security issue that requires comprehensive closed-loop management measures to solve. Use automated scanning tools, manual penetration testing, and other methods to fully discover security vulnerabilities in systems and applications. This is the first step in closed-loop management and lays the foundation for subsequent evaluation and repair. Existing technologies use machine self-learning technology to learn and identify effective defects based on historical code audit information, thereby enhancing the security of application software systems. However, the efficiency of identifying code vulnerabilities is low.

中国专利申请号：CN202310636553.5公开了一种源代码缺陷静态审计检测系统，包括：AI自动审计模块，通过机器自学习技术，基于历史代码审计信息、函数白名单和代码缺陷知识库，学习识别有效缺陷；第三方组件检测模块，用于根据公开漏洞库的缺陷规则集，进行开源代码漏洞检测；字节码扫描模块，通过Java字节码扫描技术，进行图形化展示和缺陷定位追踪；部署模块，用于进行虚拟化云部署、分布式部署、镜像部署。该申请的优势在于：固本清源：从代码源头解决安全问题，实现安全左移，增强了应用软件系统的安全性和健壮性；自主可控：产品自主研发、安全可控，符合信创要求；部署灵活：产品能够虚拟化部署、分布式部署、Docker镜像部署；性价比高：性能指标高、价格较低。由此可见，所述源代码缺陷静态审计检测系统存在以下问题：针对代码漏洞的识别效率低。Chinese patent application number: CN202310636553.5 discloses a source code defect static audit detection system, including: AI automatic audit module, through machine self-learning technology, based on historical code audit information, function whitelist and code defect knowledge base, learn to identify effective defects; third-party component detection module, used to detect open source code vulnerabilities according to the defect rule set of the public vulnerability library; bytecode scanning module, through Java bytecode scanning technology, for graphical display and defect location tracking; deployment module, used for virtualized cloud deployment, distributed deployment, and mirror deployment. The advantages of this application are: consolidating the source: solving security problems from the source of the code, realizing security left shift, and enhancing the security and robustness of the application software system; autonomous and controllable: the product is independently developed, safe and controllable, and meets the requirements of information innovation; flexible deployment: the product can be deployed in virtualization, distributed, and Docker images; cost-effective: high performance indicators and low prices. It can be seen that the source code defect static audit detection system has the following problems: low efficiency in identifying code vulnerabilities.

发明内容Summary of the invention

为此，本发明提供一种基于知识库的代码缺陷漏洞扫描确定方法，用以克服现有技术中针对代码漏洞的识别效率低的问题。To this end, the present invention provides a knowledge base-based code defect vulnerability scanning and determination method to overcome the problem of low efficiency in identifying code vulnerabilities in the prior art.

为实现上述目的，本发明提供一种基于知识库的代码缺陷漏洞扫描确定方法。包括：To achieve the above objectives, the present invention provides a method for scanning and determining code defects and vulnerabilities based on a knowledge base. The method comprises:

步骤S1、将知识库的目标代码分为若干子代码段；Step S1, dividing the target code of the knowledge base into several sub-code segments;

步骤S2、基于各所述子代码段的代码重复度与各子代码段的条件概率计算所述目标代码的异常倾向评价值；Step S2, calculating the abnormal tendency evaluation value of the target code based on the code repetition degree of each sub-code segment and the conditional probability of each sub-code segment;

步骤S3、基于所述异常倾向评价值分析所述目标代码的异常倾向是否合格；Step S3, analyzing whether the abnormal tendency of the target code is qualified based on the abnormal tendency evaluation value;

步骤S4、对异常倾向不合格的目标代码进行攻击性测试，对各子代码段进行标记；Step S4, performing an aggressiveness test on the target code that fails the abnormal tendency test, and marking each sub-code segment;

步骤S5、基于二级子代码段重新生成新子代码段，并进行攻击性测试以对新子代码段进行标记；Step S5: regenerate a new sub-code segment based on the secondary sub-code segment, and perform an offensive test to mark the new sub-code segment;

步骤S6、基于一级子代码段的占比调节针对所述子代码段的攻击频率。Step S6: adjusting the attack frequency on the sub-code segment based on the proportion of the first-level sub-code segment.

进一步地，在所述步骤S2中分别计算各所述子代码段间的条件概率，并基于计算的条件概率的平均值P与各子代码段的代码重复度C计算针对所述目标代码的异常倾向评价值，设定异常倾向评价值S=α×P/P0+β×C/C0，其中α为条件概率权重系数，β为重复度权重系数，P0为预设平均值，C0为预设代码重复度。Furthermore, in the step S2, the conditional probabilities between the sub-code segments are calculated respectively, and the abnormal tendency evaluation value for the target code is calculated based on the calculated average value P of the conditional probabilities and the code repetition C of each sub-code segment, and the abnormal tendency evaluation value S=α×P/P0+β×C/C0 is set, wherein α is the conditional probability weight coefficient, β is the repetition weight coefficient, P0 is the preset average value, and C0 is the preset code repetition.

进一步地，在所述步骤S3中基于所述异常倾向评价值分析所述目标代码的异常倾向是否合格，Furthermore, in step S3, the abnormal tendency of the target code is analyzed based on the abnormal tendency evaluation value to determine whether it is qualified.

若所述异常倾向评价值小于或等于第一预设异常倾向评价值，判定所述目标代码的异常倾向合格；If the abnormal tendency evaluation value is less than or equal to the first preset abnormal tendency evaluation value, it is determined that the abnormal tendency of the target code is qualified;

若所述异常倾向评价值大于所述第一预设异常倾向评价值，判定所述目标代码的异常倾向不合格，并对各所述子代码段进行攻击性测试，并记录测试日志。If the abnormal tendency evaluation value is greater than the first preset abnormal tendency evaluation value, the abnormal tendency of the target code is determined to be unqualified, and an aggressiveness test is performed on each of the sub-code segments, and a test log is recorded.

进一步地，在所述步骤S4中在对各所述子代码段进行攻击性测试完成时，基于针对单个子代码段的所述攻击性测试的错误日志占比对该子代码段进行标记，Furthermore, in step S4, when the aggressiveness test is completed for each sub-code segment, the sub-code segment is marked based on the error log ratio of the aggressiveness test for a single sub-code segment.

若所述错误日志占比小于或等于第一预设错误日志占比，将所述子代码段标记为一级子代码段；If the error log ratio is less than or equal to a first preset error log ratio, marking the sub-code segment as a first-level sub-code segment;

若所述错误日志占比大于所述第一预设错误日志占比且小于或等于第二预设错误日志占比，将所述子代码段标记为二级子代码段，并将该子代码段存入临时数据库；If the error log ratio is greater than the first preset error log ratio and less than or equal to the second preset error log ratio, marking the sub-code segment as a secondary sub-code segment, and storing the sub-code segment in a temporary database;

若所述错误日志占比大于所述第二预设错误日志占比，将所述子代码段标记为三级子代码段，并将该子代码段存入漏洞数据库。If the error log ratio is greater than the second preset error log ratio, the sub-code segment is marked as a third-level sub-code segment, and the sub-code segment is stored in the vulnerability database.

进一步地，在对各所述子代码段标记完成时基于所述二级子代码段的数量占比调节所述第一预设异常倾向评价值。Furthermore, when marking of each of the sub-code segments is completed, the first preset abnormal tendency evaluation value is adjusted based on the proportion of the number of the secondary sub-code segments.

进一步地，在对各所述子代码段标记完成时，将所述临时数据库中的子代码段随机连接，并重新随机生成若干新子代码段，对各新子代码段进行攻击测试，并基于攻击测试后各新子代码段的新错误日志占比对各新子代码段进行标记，Furthermore, when the marking of each sub-code segment is completed, the sub-code segments in the temporary database are randomly connected, and a number of new sub-code segments are randomly generated again, and an attack test is performed on each new sub-code segment, and each new sub-code segment is marked based on the new error log ratio of each new sub-code segment after the attack test.

若所述新错误日志占比小于或等于所述第二预设错误日志占比，将该新子代码段标记为二级子代码段，并将该新子代码段存入所述临时数据库；If the new error log ratio is less than or equal to the second preset error log ratio, marking the new sub-code segment as a secondary sub-code segment, and storing the new sub-code segment in the temporary database;

若所述新错误日志占比大于所述第二预设错误日志占比，将该新子代码段标记为三级子代码段，并将该新子代码段存入所述漏洞数据库。If the new error log ratio is greater than the second preset error log ratio, the new sub-code segment is marked as a third-level sub-code segment, and the new sub-code segment is stored in the vulnerability database.

进一步地，在对各所述新子代码段标记完成时基于所述三级子代码段的三级占比确定生成新子代码段的数量，Further, when each of the new sub-code segments is marked complete, the number of generated new sub-code segments is determined based on the third-level proportion of the third-level sub-code segments,

若所述三级占比小于或等于预设三级占比，确定生成所述新子代码段的数量为一级数量；If the third-level proportion is less than or equal to the preset third-level proportion, determining that the number of the generated new sub-code segments is the first-level number;

若所述三级占比大于所述预设三级占比，确定生成所述新子代码段的数量为二级数量。If the third-level proportion is greater than the preset third-level proportion, it is determined that the number of the new sub-code segments generated is the second-level number.

进一步地，在对所述子代码段标记完成时基于标记的一级子代码段的一级占比修正针对各子代码段的攻击测试过程中的攻击频率。Furthermore, when the sub-code segments are marked, the attack frequency in the attack test process for each sub-code segment is corrected based on the first-level proportion of the marked first-level sub-code segments.

进一步地，在对各所述子代码段标记完成时基于所述三级子代码段的占比判定是否更新组件版本，Further, when each of the sub-code segments is marked complete, it is determined whether to update the component version based on the proportion of the third-level sub-code segments.

若所述三级子代码段的占比大于或等于预设三级子代码段的占比，判定组件版本低，更新组件版本；If the proportion of the third-level sub-code segments is greater than or equal to the proportion of the preset third-level sub-code segments, it is determined that the component version is low, and the component version is updated;

若所述三级子代码段的占比小于所述预设三级子代码段的占比，判定组件版本正常，不更新组件版本。If the proportion of the third-level sub-code segments is less than the proportion of the preset third-level sub-code segments, it is determined that the component version is normal and the component version is not updated.

进一步地，所述攻击测试包括：基于大数据中已知漏洞代码段与所述子代码段的匹配度分析子代码段是否存在漏洞以及采用黑盒测试对子代码段进行测试。Furthermore, the attack test includes: analyzing whether the sub-code segment has a vulnerability based on the matching degree between the known vulnerability code segment in the big data and the sub-code segment, and testing the sub-code segment using a black box test.

与现有技术相比，本发明的有益效果在于，本发明中根据各子代码段间的条件概率越大，各子代码段间的联系越紧密，代码间的依赖关系越强，需要对代码进行更严密的测试，各子代码段的重复度越高，维护难度越大，基于各子代码段间的条件概率以及重复度综合分析目标代码的异常倾向，提高了针对目标代码的结构复杂程度的控制精度，对异常倾向不合格的目标代码进行攻击性测试，并根据测试结果对各子代码段进行标记，将存在漏洞的子代码段存入漏洞数据库，便于后续与漏洞数据库中的代码段进行比对以判断目标代码段是否存在漏洞，提高了代码的漏洞识别效率。Compared with the prior art, the beneficial effect of the present invention lies in that, in the present invention, the greater the conditional probability between each sub-code segment, the closer the connection between each sub-code segment, the stronger the dependency between codes, the more rigorous the code needs to be tested, the higher the repetition of each sub-code segment, the greater the maintenance difficulty, the abnormal tendency of the target code is comprehensively analyzed based on the conditional probability and repetition between each sub-code segment, the control accuracy of the structural complexity of the target code is improved, the target code with unqualified abnormal tendency is subjected to an aggressive test, and each sub-code segment is marked according to the test results, and the sub-code segment with vulnerabilities is stored in a vulnerability database, which is convenient for subsequent comparison with the code segment in the vulnerability database to determine whether the target code segment has vulnerabilities, thereby improving the vulnerability identification efficiency of the code.

进一步地，本发明中将二级子代码段存入临时数据库，并将子代码段重新组合再生成新的子代码段，对新生成的子代码段进行测试，并将存在漏洞的子代码段存入漏洞数据库，实现了漏洞数据库中子代码段的种类的扩增。Furthermore, in the present invention, the secondary sub-code segments are stored in a temporary database, and the sub-code segments are recombined to generate new sub-code segments, the newly generated sub-code segments are tested, and the sub-code segments with vulnerabilities are stored in a vulnerability database, thereby achieving an expansion of the types of sub-code segments in the vulnerability database.

进一步地，本发明中根据二级子代码段的占比调小第一预设异常倾向评价值，缩小合格的目标代码的范围，增大了需进一步分析的目标代码的范围，提高了针对目标代码段的分析准确度。Furthermore, in the present invention, the first preset abnormal tendency evaluation value is reduced according to the proportion of the secondary sub-code segment, the range of qualified target codes is narrowed, the range of target codes that need further analysis is increased, and the analysis accuracy of the target code segment is improved.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明基于知识库的代码缺陷漏洞扫描确定方法的流程图；FIG1 is a flow chart of a method for scanning and determining code defects and vulnerabilities based on a knowledge base of the present invention;

图2为本发明分析目标代码的异常倾向是否合格的流程图；FIG2 is a flow chart of analyzing whether the abnormal tendency of a target code is qualified according to the present invention;

图3为本发明标记子代码段的流程图；FIG3 is a flow chart of a marking subcode segment of the present invention;

图4为本发明调节第一预设异常倾向评价值的流程图。FIG. 4 is a flow chart of adjusting the first preset abnormal tendency evaluation value according to the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本发明的目的和优点更加清楚明白，下面结合实施例对本发明作进一步描述；应当理解，此处所描述的具体实施例仅仅用于解释本发明，并不用于限定本发明。In order to make the objects and advantages of the present invention more clearly understood, the present invention is further described below in conjunction with embodiments; it should be understood that the specific embodiments described herein are only used to explain the present invention and are not used to limit the present invention.

需要指出的是，在本实施例中的数据均为通过本发明所述系统在进行本次判定前6个月的历史数据以及对应的历史判定结果中综合分析评定得出。本发明所述系统在本次检测前根据前三个月中累计检测的37383例检索结果评价值综合确定针对本次判定的各项预设参数标准的数值。本领域的技术人员可以理解的是，本发明所述系统针对单项上述参数的确定方式可以为根据数据分布选取占比最高的数值作为预设标准参数、使用加权求和以将求得的数值作为预设标准参数、将各历史数据代入至特定公式并将利用该公式求得的数值作为预设标准参数或其他选取方式，只要满足本发明所述系统能够通过获取的数值明确界定单项判定过程中的不同特定情况即可。It should be pointed out that the data in this embodiment are obtained by comprehensive analysis and evaluation of the historical data of the system of the present invention in the six months before this judgment and the corresponding historical judgment results. Before this detection, the system of the present invention comprehensively determines the values of the preset parameter standards for this judgment based on the evaluation values of the 37,383 retrieval results detected cumulatively in the first three months. It can be understood by those skilled in the art that the system of the present invention can determine the single parameter mentioned above by selecting the value with the highest proportion as the preset standard parameter according to the data distribution, using weighted summation to use the obtained value as the preset standard parameter, substituting each historical data into a specific formula and using the value obtained by the formula as the preset standard parameter or other selection methods, as long as the system of the present invention can clearly define the different specific situations in the single judgment process through the obtained values.

下面参照附图来描述本发明的优选实施方式。本领域技术人员应当理解的是，这些实施方式仅仅用于解释本发明的技术原理，并非在限制本发明的保护范围。The preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only used to explain the technical principles of the present invention and are not intended to limit the protection scope of the present invention.

需要说明的是，在本发明的描述中，术语“上”、“下”、“左”、“右”、“内”、“外”等指示的方向或位置关系的术语是基于附图所示的方向或位置关系，这仅仅是为了便于描述，而不是指示或暗示所述装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。It should be noted that, in the description of the present invention, terms such as "up", "down", "left", "right", "inside" and "outside" indicating directions or positional relationships are based on the directions or positional relationships shown in the drawings. This is merely for the convenience of description and does not indicate or imply that the device or element must have a specific orientation, be constructed and operated in a specific orientation. Therefore, it cannot be understood as a limitation on the present invention.

此外，还需要说明的是，在本发明的描述中，除非另有明确的规定和限定，术语“安装”、“相连”、“连接”应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通。对于本领域技术人员而言，可根据具体情况理解上述术语在本发明中的具体含义。In addition, it should be noted that in the description of the present invention, unless otherwise clearly specified and limited, the terms "installed", "connected", and "connected" should be understood in a broad sense, for example, it can be a fixed connection, a detachable connection, or an integral connection; it can be a mechanical connection or an electrical connection; it can be a direct connection, or it can be indirectly connected through an intermediate medium, or it can be the internal communication of two components. For those skilled in the art, the specific meanings of the above terms in the present invention can be understood according to specific circumstances.

请参阅图1所示，其为本发明基于知识库的代码缺陷漏洞扫描确定方法的流程图。Please refer to FIG. 1 , which is a flow chart of a code defect vulnerability scanning and determination method based on a knowledge base of the present invention.

本发明实施例提供一种基于知识库的代码缺陷漏洞扫描确定方法，包括：An embodiment of the present invention provides a method for scanning and determining code defects and vulnerabilities based on a knowledge base, comprising:

具体而言，在所述步骤S2中分别计算各所述子代码段间的条件概率，并基于计算的条件概率的平均值P与各子代码段的代码重复度C计算针对所述目标代码的异常倾向评价值，设定异常倾向评价值S=α×P/P0+β×C/C0，其中α为条件概率权重系数，β为重复度权重系数，P0为预设平均值，C0为预设代码重复度。Specifically, in the step S2, the conditional probabilities between the sub-code segments are calculated respectively, and the abnormal tendency evaluation value for the target code is calculated based on the calculated average value P of the conditional probabilities and the code repetition C of each sub-code segment, and the abnormal tendency evaluation value S=α×P/P0+β×C/C0 is set, wherein α is the conditional probability weight coefficient, β is the repetition weight coefficient, P0 is the preset average value, and C0 is the preset code repetition.

本发明实施例中条件概率权重系数为0.35，重复度权重系数为0.65，P0为0.65，预设代码重复度为0.75。In the embodiment of the present invention, the conditional probability weight coefficient is 0.35, the repetition weight coefficient is 0.65, P0 is 0.65, and the preset code repetition is 0.75.

本发明中根据各子代码段间的条件概率越大，各子代码段间的联系越紧密，代码间的依赖关系越强，需要对代码进行更严密的测试，各子代码段的重复度越高，维护难度越大，基于各子代码段间的条件概率以及重复度综合分析目标代码的异常倾向，提高了针对目标代码的结构复杂程度的控制精度，对异常倾向不合格的目标代码进行攻击性测试，并根据测试结果对各子代码段进行标记，将存在漏洞的子代码段存入漏洞数据库，便于后续与漏洞数据库中的代码段进行比对以判断目标代码段是否存在漏洞，提高了代码的漏洞识别效率。In the present invention, the greater the conditional probability between each sub-code segment, the closer the connection between each sub-code segment, the stronger the dependency between codes, the more rigorous the code needs to be tested, the higher the repetition of each sub-code segment, the greater the maintenance difficulty, the abnormal tendency of the target code is comprehensively analyzed based on the conditional probability and repetition between each sub-code segment, the control accuracy of the structural complexity of the target code is improved, the target code with unqualified abnormal tendency is tested for aggressiveness, and each sub-code segment is marked according to the test results, and the sub-code segment with vulnerabilities is stored in the vulnerability database, which is convenient for subsequent comparison with the code segment in the vulnerability database to determine whether the target code segment has vulnerabilities, thereby improving the vulnerability identification efficiency of the code.

请参阅图2所示，其为分析目标代码的异常倾向是否合格的流程图。Please refer to FIG. 2 , which is a flow chart for analyzing whether the abnormal tendency of the target code is qualified.

具体而言，在所述步骤S3中基于所述异常倾向评价值分析所述目标代码的异常倾向是否合格，Specifically, in step S3, the abnormal tendency of the target code is analyzed based on the abnormal tendency evaluation value to determine whether it is qualified.

本发明实施例中第一预设异常倾向评价值为1.05。In the embodiment of the present invention, the first preset abnormal tendency evaluation value is 1.05.

请参阅图3所示，其为标记子代码段的流程图。Please refer to FIG. 3 , which is a flow chart of the marking sub-code segment.

具体而言，在所述步骤S4中在对各所述子代码段进行攻击性测试完成时，基于针对单个子代码段的所述攻击性测试的错误日志占比对该子代码段进行标记，Specifically, in step S4, when the aggressiveness test is completed for each sub-code segment, the sub-code segment is marked based on the error log ratio of the aggressiveness test for a single sub-code segment.

本发明实施例中错误日志占比为错误日志的数量与总日志数量的比值，第一预设错误日志占比为0.3，第二预设错误日志占比为0.4。In the embodiment of the present invention, the error log ratio is the ratio of the number of error logs to the total number of logs, the first preset error log ratio is 0.3, and the second preset error log ratio is 0.4.

本发明中将二级子代码段存入临时数据库，并将子代码段重新组合再生成新的子代码段，对新生成的子代码段进行测试，并将存在漏洞的子代码段存入漏洞数据库，实现了漏洞数据库中子代码段的种类的扩增。In the present invention, the secondary sub-code segments are stored in a temporary database, and the sub-code segments are recombined to generate new sub-code segments, the newly generated sub-code segments are tested, and the sub-code segments with vulnerabilities are stored in a vulnerability database, thereby achieving an expansion of the types of sub-code segments in the vulnerability database.

请参阅图4所示，其为调节第一预设异常倾向评价值的流程图。Please refer to FIG. 4 , which is a flow chart of adjusting the first preset abnormal tendency evaluation value.

具体而言，在对各所述子代码段标记完成时基于所述二级子代码段的数量占比调节所述第一预设异常倾向评价值，Specifically, when marking of each of the sub-code segments is completed, the first preset abnormal tendency evaluation value is adjusted based on the proportion of the number of the secondary sub-code segments.

若所述数量占比小于或等于预设数量占比，选用第一调节系数t1将所述第一预设异常倾向评价值D调节至对应值，设定调节后的第一预设异常倾向评价值D’=t1×D0，其中D0为调节前的初始第一预设异常倾向评价值；If the quantity ratio is less than or equal to the preset quantity ratio, the first adjustment coefficient t1 is selected to adjust the first preset abnormal tendency evaluation value D to the corresponding value, and the adjusted first preset abnormal tendency evaluation value D'=t1×D0 is set, where D0 is the initial first preset abnormal tendency evaluation value before adjustment;

若所述数量占比大于所述预设数量占比，选用第二调节系数t2将所述第一预设异常倾向评价值D调节至对应值，设定调节后的第一预设异常倾向评价值D’=t2×D0。If the quantity ratio is greater than the preset quantity ratio, the second adjustment coefficient t2 is selected to adjust the first preset abnormal tendency evaluation value D to a corresponding value, and the adjusted first preset abnormal tendency evaluation value D' is set to t2×D0.

本发明实施例中第一调节系数为0.9，第二调节系数为0.8。In the embodiment of the present invention, the first adjustment coefficient is 0.9, and the second adjustment coefficient is 0.8.

本发明中根据二级子代码段的占比调小第一预设异常倾向评价值，缩小合格的目标代码的范围，增大了需进一步分析的目标代码的范围，提高了针对目标代码段的分析准确度。In the present invention, the first preset abnormal tendency evaluation value is reduced according to the proportion of the secondary sub-code segment, the range of qualified target codes is narrowed, the range of target codes that need further analysis is increased, and the analysis accuracy of the target code segment is improved.

具体而言，在对各所述子代码段标记完成时，将所述临时数据库中的子代码段随机连接，并随机重新生成若干新子代码段，对各新子代码段进行攻击测试，并基于攻击测试后各新子代码段的新错误日志占比对各新子代码段进行标记，Specifically, when the marking of each sub-code segment is completed, the sub-code segments in the temporary database are randomly connected, and a number of new sub-code segments are randomly regenerated, and an attack test is performed on each new sub-code segment, and each new sub-code segment is marked based on the proportion of new error logs of each new sub-code segment after the attack test.

具体而言，在对各所述新子代码段标记完成时基于所述三级子代码段的三级占比确定生成新子代码段的数量，Specifically, when marking of each of the new sub-code segments is completed, the number of generated new sub-code segments is determined based on the third-level proportion of the third-level sub-code segments.

具体而言，在对所述子代码段标记完成时基于标记的一级子代码段的一级占比修正针对各子代码段的攻击测试过程中的攻击频率，Specifically, when the sub-code segments are marked, the attack frequency in the attack test process for each sub-code segment is corrected based on the first-level proportion of the marked first-level sub-code segments.

若所述一级占比小于或等于预设一级占比，判定选用第一修正系数z1将所述攻击频率V修正至对应值，设定修正后的攻击频率V’=z1×V0，其中V0为修正前的初始攻击频率；If the first-level proportion is less than or equal to the preset first-level proportion, it is determined that the first correction coefficient z1 is selected to correct the attack frequency V to the corresponding value, and the corrected attack frequency V'=z1×V0 is set, where V0 is the initial attack frequency before correction;

若所述一级占比大于所述预设一级占比，判定选用第二修正系数z2将所述攻击频率V修正至对应值，设定修正后的攻击频率V’=z2×V0。If the first-level proportion is greater than the preset first-level proportion, it is determined that the second correction coefficient z2 is selected to correct the attack frequency V to a corresponding value, and the corrected attack frequency V'=z2×V0 is set.

本发明实施例中第一修正系数为2，第二修正系数为2.5。In the embodiment of the present invention, the first correction coefficient is 2, and the second correction coefficient is 2.5.

具体而言，在对各所述子代码段标记完成时基于所述三级子代码段的占比判定是否更新组件版本，Specifically, when each of the sub-code segments is marked complete, it is determined whether to update the component version based on the proportion of the third-level sub-code segments.

具体而言，所述攻击测试包括：基于大数据中已知漏洞代码段与所述子代码段的匹配度分析子代码段是否存在漏洞以及采用黑盒测试对子代码段进行测试。Specifically, the attack test includes: analyzing whether the sub-code segment has a vulnerability based on the matching degree between the known vulnerability code segment in the big data and the sub-code segment, and testing the sub-code segment using a black box test.

至此，已经结合附图所示的优选实施方式描述了本发明的技术方案，但是，本领域技术人员容易理解的是，本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下，本领域技术人员可以对相关技术特征做出等同的更改或替换，这些更改或替换之后的技术方案都将落入本发明的保护范围之内。So far, the technical solutions of the present invention have been described in conjunction with the preferred embodiments shown in the accompanying drawings. However, it is easy for those skilled in the art to understand that the protection scope of the present invention is obviously not limited to these specific embodiments. Without departing from the principle of the present invention, those skilled in the art can make equivalent changes or substitutions to the relevant technical features, and the technical solutions after these changes or substitutions will fall within the protection scope of the present invention.

以上所述仅为本发明的优选实施例，并不用于限制本发明；对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims

1. The code defect vulnerability scanning and determining method based on the knowledge base is characterized by comprising the following steps of:

S1, dividing an object code of a knowledge base into a plurality of sub-code segments;

S2, calculating an abnormality tendency evaluation value of the target code based on the code repetition degree of each sub-code segment and the conditional probability of each sub-code segment;

s3, analyzing whether the abnormal tendency of the target code is qualified or not based on the abnormal tendency evaluation value;

s4, carrying out aggressiveness test on the object code with unqualified abnormal tendency, and marking each subcode segment;

S5, regenerating a new sub-code segment based on the second-level sub-code segment, and performing an aggressiveness test to mark the new sub-code segment;

and step S6, adjusting the attack frequency for the sub-code segments based on the duty ratio of the first-level sub-code segments.

2. The knowledge base-based code bug scan determination method according to claim 1, wherein in the step S2, conditional probabilities among the sub-code segments are calculated respectively, and an abnormal tendency evaluation value for the target code is calculated based on an average value P of the calculated conditional probabilities and a code repetition degree C of each sub-code segment, and an abnormal tendency evaluation value s=α×p/p0+β×c/C0 is set, where α is a conditional probability weight coefficient, β is a repetition degree weight coefficient, P0 is a preset average value, and C0 is a preset code repetition degree.

3. The knowledge base based code bug scan determination method of claim 2, wherein analyzing whether the object code is qualified for abnormal tendency based on the abnormal tendency evaluation value in the step S3,

If the abnormal tendency evaluation value is smaller than or equal to a first preset abnormal tendency evaluation value, judging that the abnormal tendency of the target code is qualified;

If the abnormality tendency evaluation value is larger than the first preset abnormality tendency evaluation value, judging that the abnormality tendency of the target code is unqualified, carrying out an aggressiveness test on each sub-code segment, and recording a test log.

4. The knowledge base based code bug scan determination method of claim 3, wherein at the completion of the offensiveness test on each of the sub-code segments in the step S4, the sub-code segments are marked based on an error log duty ratio of the offensiveness test for a single sub-code segment,

If the error log duty cycle is smaller than or equal to a first preset error log duty cycle, marking the sub-code segment as a primary sub-code segment;

If the error log duty ratio is larger than the first preset error log duty ratio and smaller than or equal to the second preset error log duty ratio, marking the subcode segment as a secondary subcode segment, and storing the subcode segment into a temporary database;

and if the error log duty ratio is larger than the second preset error log duty ratio, marking the sub-code segment as a three-level sub-code segment, and storing the sub-code segment into a vulnerability database.

5. The knowledge base based code bug scan determination method of claim 4, wherein the first preset anomaly propensity scoring value is adjusted based on a number of secondary subcode segments duty cycle when marking each of the subcode segments is complete.

6. The method of claim 5, wherein when the marking of each sub-code segment is completed, the sub-code segments in the temporary database are randomly connected and a plurality of new sub-code segments are randomly generated again, attack tests are performed on each new sub-code segment, and each new sub-code segment is marked based on the new error log occupation ratio of each new sub-code segment after the attack tests,

If the new error log duty ratio is smaller than or equal to the second preset error log duty ratio, marking the new sub-code segment as a secondary sub-code segment, and storing the new sub-code segment into the temporary database;

and if the new error log duty ratio is larger than the second preset error log duty ratio, marking the new sub-code segment as a three-level sub-code segment, and storing the new sub-code segment into the vulnerability database.

7. The knowledge base based code bug scan determination method of claim 6, wherein the number of new sub-code segments generated is determined based on the three-level duty cycle of the three-level sub-code segments when marking each of the new sub-code segments is complete,

If the three-level duty ratio is smaller than or equal to a preset three-level duty ratio, determining the number of the new sub-code segments to be generated as a first-level number;

and if the three-level duty ratio is larger than the preset three-level duty ratio, determining that the number of the generated new sub-code segments is two-level number.

8. The knowledge base based code bug scan determination method of claim 7, wherein the primary duty cycle of marked primary sub-code segments when marking the sub-code segments is completed corrects attack frequencies during attack testing for each sub-code segment.

9. The knowledge base based code bug scan determination method of claim 8, wherein determining whether to update a component version based on the duty cycle of the three-level sub-code segments when marking each of the sub-code segments is complete,

If the duty ratio of the three-level sub-code segment is larger than or equal to the duty ratio of the preset three-level sub-code segment, judging that the component version is low, and updating the component version;

and if the duty ratio of the three-level sub-code segment is smaller than that of the preset three-level sub-code segment, judging that the component version is normal, and not updating the component version.

10. The knowledge base based code bug scan determination method of claim 9, wherein the attack test comprises: analyzing whether the sub-code segment has the loopholes or not based on the matching degree of the known loophole code segment in the big data and the sub-code segment, and adopting a black box test to test the sub-code segment.