CN112069069A

CN112069069A - Defect automatic location analysis method, device and readable storage medium

Info

Publication number: CN112069069A
Application number: CN202010920498.9A
Authority: CN
Inventors: 黄蕾
Original assignee: Ping An Trust Co Ltd
Current assignee: Ping An Trust Co Ltd
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2020-12-11

Abstract

The invention relates to an automatic testing tool, and provides a defect automatic positioning analysis method, equipment and medium. In the present invention, the defect data to be analyzed is firstly matched based on the text similarity algorithm, so that when facing the common defect problems in the defect knowledge base, the cause of the defect and the repair plan can be quickly matched; When the defect data to be analyzed is invalid, further fuzzy matching is performed, so that when an accurate result cannot be found, a best matching result that can be found in the current knowledge base is given first; In the repair scheme, the knowledge base is updated and the defect data to be verified is calibrated based on this, so that the defect analysis results of the defect data only through fuzzy matching can be improved, and the knowledge base is adaptively optimized to further improve the efficiency of defect location analysis. In addition, the present invention also relates to the blockchain technology, and the defect data to be verified can be stored in the blockchain.

Description

Defect automatic location analysis method, device and readable storage medium

技术领域technical field

本发明涉及软件测试技术领域，尤其涉及一种缺陷自动定位分析方法、设备及计算机可读存储介质。The present invention relates to the technical field of software testing, and in particular, to a method, device and computer-readable storage medium for automatic defect location analysis.

背景技术Background technique

无论是软件开发人员还是软件测试人员，日常工作都与软件缺陷息息相关。测试人员需要在程序项目正式上线前尽可能的发现程序缺陷或功能问题，并由开发人员一一进行修复，从而保证程序项目按照用户的实际需求高质量、稳定的交付使用，这个过程往往需要花费大量的时间，因此，如何快速、准确的排查定位缺陷原因，针对性的进行有效修复，是开发、测试人员长期需要面对的难题。Whether you are a software developer or a software tester, your day-to-day work involves software defects. Testers need to find program defects or functional problems as much as possible before the program project is officially launched, and fix them one by one by the developers, so as to ensure the high-quality and stable delivery of the program project according to the actual needs of users. This process often costs A lot of time, therefore, how to quickly and accurately investigate and locate the cause of the defect, and effectively repair it in a targeted manner, is a long-term problem that developers and testers need to face.

业内目前对于软件缺陷的排查和定位，主要通过查询日志、抓包分析等方法手工进行，排查定位的效率依赖测试、开发人员的个人经验，定位过程比较盲目和困难，往往导致缺陷发现后不能及时得到修复，从而阻塞开发测试进度，因而导致了现有的软件缺陷定位分析方式的效率低下的技术问题。At present, the inspection and location of software defects in the industry are mainly carried out manually by querying logs, packet capture analysis and other methods. The efficiency of inspection and location depends on testing and the personal experience of developers. The location process is relatively blind and difficult, which often results in failure to timely detect defects. be repaired, thereby blocking the development and testing progress, thus causing the technical problem of inefficiency of the existing software defect location analysis methods.

发明内容SUMMARY OF THE INVENTION

本发明的主要目的在于提供一种缺陷自动定位分析方法、设备及计算机可读存储介质，旨在解决现有的软件缺陷定位分析方式的效率低下的技术问题。The main purpose of the present invention is to provide an automatic defect location analysis method, device and computer-readable storage medium, aiming at solving the technical problem of low efficiency of the existing software defect location analysis method.

为实现上述目的，本发明提供一种缺陷自动定位分析方法，所述缺陷自动定位分析方法包括以下步骤：In order to achieve the above object, the present invention provides a method for automatic positioning and analysis of defects, and the method for automatic positioning and analysis of defects comprises the following steps:

获取待分析缺陷数据，并提取所述待分析缺陷数据中的关键词，以基于预设文本相似度算法将所述关键词在预设的缺陷知识库中进行精确匹配，其中，所述缺陷知识库包含多类别缺陷样本数据，以及与所述多类别缺陷样本数据对应的缺陷原因与修复方案；Acquire defect data to be analyzed, and extract keywords in the defect data to be analyzed, so as to accurately match the keywords in a preset defect knowledge base based on a preset text similarity algorithm, wherein the defect knowledge The library contains multi-category defect sample data, and defect causes and repair schemes corresponding to the multi-category defect sample data;

在检测到当前精确匹配失败时，识别所述待分析缺陷数据所属的缺陷类别，并在所述缺陷知识库中查找与所述缺陷类别对应的目标缺陷原因以及目标修复方案，以完成对所述待分析缺陷数据的模糊匹配；When it is detected that the current exact match fails, identify the defect category to which the defect data to be analyzed belongs, and search the defect knowledge base for the target defect cause and target repair scheme corresponding to the defect category, so as to complete the Fuzzy matching of defect data to be analyzed;

将模糊匹配后的待分析缺陷数据作为待校验缺陷数据，并在获取到所述待校验缺陷数据的准确缺陷原因与准确修复方案时，将所述准确缺陷原因与准确修复方案更新至所述缺陷知识库中，以对所述待校准缺陷数据进行校准。The defect data to be analyzed after the fuzzy matching is used as the defect data to be verified, and when the accurate defect cause and accurate repair scheme of the defect data to be verified are obtained, the accurate defect cause and accurate repair scheme are updated to the above. In the defect knowledge base, to calibrate the defect data to be calibrated.

可选地，所述在检测到当前精确匹配失败时，识别所述待分析缺陷数据所属的缺陷类别，并在所述缺陷知识库中查找与所述缺陷类别对应的目标缺陷原因以及目标修复方案，以完成对所述待分析缺陷数据的模糊匹配的步骤包括：Optionally, when detecting that the current exact match fails, identify the defect category to which the defect data to be analyzed belongs, and search the defect knowledge base for the target defect cause and target repair scheme corresponding to the defect category. , the steps of completing the fuzzy matching of the defect data to be analyzed include:

在检测到当前精确匹配失败时，使用预设实体识别模型识别出所述待分析缺陷数据的实体信息，并基于所述实体信息得到所述问题模板；When detecting that the current exact match fails, use a preset entity recognition model to identify the entity information of the defect data to be analyzed, and obtain the problem template based on the entity information;

对所述待分析缺陷数据进行多层次语义解析，得到所述待分析缺陷数据的多层次语义；Perform multi-level semantic analysis on the defect data to be analyzed to obtain the multi-level semantics of the defect data to be analyzed;

使用预设概率图模型，并结合所述问题模板与所述多层次语义，预测所述待分析缺陷数据对应到所述缺陷知识库中的缺陷类别；Using a preset probability graph model and combining the problem template and the multi-level semantics, predict that the defect data to be analyzed corresponds to the defect category in the defect knowledge base;

根据所述缺陷类别与所述实体信息，将所述待分析缺陷数据转换为所述缺陷知识库的结构化查询，查询得到所述目标缺陷原因与目标修复方案，以完成对所述待分析缺陷数据的模糊匹配。According to the defect category and the entity information, the defect data to be analyzed is converted into a structured query of the defect knowledge base, and the target defect cause and the target repair plan are obtained through the query, so as to complete the analysis of the defect to be analyzed. Fuzzy matching of data.

可选地，所述将模糊匹配后的待分析缺陷数据作为待校验缺陷数据，并在获取到所述待校验缺陷数据的准确缺陷原因与准确修复方案时，将所述准确缺陷原因与准确修复方案更新至所述缺陷知识库中，以对所述待校准缺陷数据进行校准的步骤之后，还包括：Optionally, the fuzzy matching defect data to be analyzed is used as the defect data to be verified, and when the accurate defect cause and the accurate repair plan of the defect data to be verified are obtained, the accurate defect cause and the accurate repair plan are obtained. After the step of updating the accurate repair scheme to the defect knowledge base to calibrate the defect data to be calibrated, the method further includes:

将校准后的待校准缺陷数据进行特征标记，以作为特征缺陷数据；Marking the calibrated defect data to be calibrated as characteristic defect data;

在检测到当前进行模糊匹配时，优先选择所述特征缺陷数据进行匹配。When it is detected that fuzzy matching is currently performed, the feature defect data is preferentially selected for matching.

可选地，所述获取待分析缺陷数据，并提取所述待分析缺陷数据中的关键词的步骤之前，还包括：Optionally, before the step of acquiring defect data to be analyzed and extracting keywords in the defect data to be analyzed, the method further includes:

获取缺陷样本数据，并对所述缺陷样本数据进行预筛选与格式转换，得到目标样本数据；Obtaining defective sample data, and performing pre-screening and format conversion on the defective sample data to obtain target sample data;

基于预设分类算法，对所述目标样本数据进行分类，得到对应于多个缺陷类别的多类别缺陷样本数据；Classifying the target sample data based on a preset classification algorithm to obtain multi-category defect sample data corresponding to multiple defect categories;

对所述多类别缺陷样本数据进行提取筛选操作，以获取各缺陷类别所对应的缺陷原因与修复方案，并建立缺陷类别、缺陷原因与修复方案信息三者间的映射关系；Extracting and screening the multi-category defect sample data to obtain the defect cause and repair scheme corresponding to each defect category, and establishing a mapping relationship between the defect category, defect cause and repair scheme information;

在检测到所述多类别缺陷样本数据的数据量达到预设数据量阈值时，构建所述缺陷知识库，以基于所述缺陷知识库进行缺陷自动定位分析。When it is detected that the data amount of the multi-category defect sample data reaches a preset data amount threshold, the defect knowledge base is constructed to perform automatic defect location analysis based on the defect knowledge base.

可选地，所述文本相似度算法包括余弦距离算法，Optionally, the text similarity algorithm includes a cosine distance algorithm,

所述获取待分析缺陷数据，并提取所述待分析缺陷数据中的关键词，以基于预设文本相似度算法将所述关键词在预设的缺陷知识库中进行精确匹配的步骤包括：The step of acquiring defect data to be analyzed, and extracting keywords in the defect data to be analyzed, so as to accurately match the keywords in a preset defect knowledge base based on a preset text similarity algorithm includes:

使用自然语言处理技术对所述待分析缺陷数据进行分词处理，以提取所述待分析缺陷数据中的关键词；Using natural language processing technology to perform word segmentation processing on the defect data to be analyzed to extract keywords in the defect data to be analyzed;

生成所述关键词在所述待分析缺陷数据中的第一词频向量集，与所述关键词在所述缺陷知识库中的第二词频向量集；generating a first word frequency vector set of the keyword in the defect data to be analyzed, and a second word frequency vector set of the keyword in the defect knowledge base;

使用余弦距离算法获取所述第一词频向量与所述第二词频向量之间的余弦相似度集合，以基于所述余弦相似度集合将所述关键词在所述缺陷知识库中进行精确匹配。A cosine similarity set between the first word frequency vector and the second word frequency vector is obtained by using a cosine distance algorithm, so as to accurately match the keyword in the defect knowledge base based on the cosine similarity set.

可选地，所述使用余弦距离算法获取所述第一词频向量与所述第二词频向量之间的余弦相似度集合，以基于所述余弦相似度集合将所述关键词在所述缺陷知识库中进行精确匹配的步骤之后，还包括：Optionally, using a cosine distance algorithm to obtain a cosine similarity set between the first word frequency vector and the second word frequency vector, so as to place the keyword in the defect knowledge based on the cosine similarity set. After the exact matching steps in the library, it also includes:

判断所述余弦相似度集合中是否存在超出预设相似度阈值的目标余弦相似度；Judging whether there is a target cosine similarity exceeding a preset similarity threshold in the cosine similarity set;

若是，则判定当前精确匹配成功；If so, it is determined that the current exact match is successful;

若否，则判定当前精确匹配失败。If not, it is determined that the current exact match fails.

可选地，所述获取待分析缺陷数据，并提取所述待分析缺陷数据中的关键词，以基于预设文本相似度算法将所述关键词在预设的缺陷知识库中进行精确匹配的步骤之后，还包括：Optionally, acquiring the defect data to be analyzed, and extracting keywords in the defect data to be analyzed, so as to accurately match the keywords in a preset defect knowledge base based on a preset text similarity algorithm. After the steps, also include:

在检测到所述余弦相似度集合中存在超出预设相似度阈值的目标余弦相似度时，获取所述缺陷知识库中与所述目标余弦相似度对应的匹配缺陷样本数据；When detecting that there is a target cosine similarity exceeding a preset similarity threshold in the cosine similarity set, acquiring matching defect sample data corresponding to the target cosine similarity in the defect knowledge base;

获取所述匹配缺陷样本数据所对应的匹配问题原因与匹配修复方案，并将所述匹配问题原因与匹配修复方案进行关联显示。Obtain the matching problem cause and matching repair scheme corresponding to the matching defect sample data, and display the matching problem cause and matching repair scheme in association.

按照预设时间间隔，定时从所述缺陷知识库中筛选出实际重现次数超出预设次数阈值的常见缺陷数据，并确定所述常规缺陷数据对应的常见缺陷类别、常见缺陷原因与常见修复方案，以生成可视化常见缺陷统计表。According to preset time intervals, regularly screen out the common defect data whose actual recurrence times exceed the preset times threshold from the defect knowledge base, and determine the common defect category, common defect cause and common repair scheme corresponding to the common defect data , to generate a visual common defect statistics table.

此外，为实现上述目的，本发明还提供一种缺陷自动定位分析装置，所述缺陷自动定位分析装置包括：In addition, in order to achieve the above purpose, the present invention also provides a defect automatic positioning analysis device, the defect automatic positioning analysis device includes:

精确匹配模块，用于获取待分析缺陷数据，并提取所述待分析缺陷数据中的关键词，以基于预设文本相似度算法将所述关键词在预设的缺陷知识库中进行精确匹配，其中，所述缺陷知识库包含多类别缺陷样本数据，以及与所述多类别缺陷样本数据对应的缺陷原因与修复方案；The exact matching module is used to obtain the defect data to be analyzed, and extract the keywords in the defect data to be analyzed, so as to accurately match the keywords in the preset defect knowledge base based on the preset text similarity algorithm, Wherein, the defect knowledge base includes multi-category defect sample data, and defect causes and repair solutions corresponding to the multi-category defect sample data;

模糊匹配模块，用于在检测到当前精确匹配失败时，识别所述待分析缺陷数据所属的缺陷类别，并在所述缺陷知识库中查找与所述缺陷类别对应的目标缺陷原因以及目标修复方案，以完成对所述待分析缺陷数据的模糊匹配；The fuzzy matching module is used for identifying the defect category to which the defect data to be analyzed belongs when the current exact matching failure is detected, and searching the defect knowledge base for the target defect cause and the target repair scheme corresponding to the defect category , to complete the fuzzy matching of the defect data to be analyzed;

缺陷校准模块，用于将模糊匹配后的待分析缺陷数据作为待校验缺陷数据，并在获取到所述待校验缺陷数据的准确缺陷原因与准确修复方案时，将所述准确缺陷原因与准确修复方案更新至所述缺陷知识库中，以对所述待校准缺陷数据进行校准。The defect calibration module is used to use the fuzzy matching defect data to be analyzed as the defect data to be verified, and when the accurate defect cause and accurate repair plan of the defect data to be verified are obtained, the accurate defect cause and the accurate repair plan are obtained. The accurate repair scheme is updated into the defect knowledge base to calibrate the defect data to be calibrated.

可选地，所述模糊匹配模块包括：Optionally, the fuzzy matching module includes:

问题模块获取单元，用于在检测到当前精确匹配失败时，使用预设实体识别模型识别出所述待分析缺陷数据的实体信息，并基于所述实体信息得到所述问题模板；a problem module acquiring unit, configured to use a preset entity recognition model to identify the entity information of the defect data to be analyzed, and obtain the problem template based on the entity information when it is detected that the current exact match fails;

多层语义解析单元，用于对所述待分析缺陷数据进行多层次语义解析，得到所述待分析缺陷数据的多层次语义；a multi-level semantic parsing unit, configured to perform multi-level semantic parsing on the defect data to be analyzed to obtain the multi-level semantics of the defect data to be analyzed;

缺陷类别预测单元，用于使用预设概率图模型，并结合所述问题模板与所述多层次语义，预测所述待分析缺陷数据对应到所述缺陷知识库中的缺陷类别；A defect category prediction unit, configured to use a preset probability graph model and combine the problem template and the multi-level semantics to predict the defect category in the defect knowledge base corresponding to the defect data to be analyzed;

模糊匹配完成单元，用于根据所述缺陷类别与所述实体信息，将所述待分析缺陷数据转换为所述缺陷知识库的结构化查询，查询得到所述目标缺陷原因与目标修复方案，以完成对所述待分析缺陷数据的模糊匹配。The fuzzy matching completion unit is used to convert the defect data to be analyzed into a structured query of the defect knowledge base according to the defect category and the entity information, and obtain the target defect cause and target repair plan by querying Complete the fuzzy matching on the defect data to be analyzed.

可选地，所述缺陷自动定位分析装置还包括：Optionally, the automatic defect location analysis device further includes:

特征标记模块，用于将校准后的待校准缺陷数据进行特征标记，以作为特征缺陷数据；The feature marking module is used for feature marking the calibrated defect data to be calibrated as feature defect data;

优先匹配模块，用于在检测到当前进行模糊匹配时，优先选择所述特征缺陷数据进行匹配。The preferential matching module is configured to preferentially select the feature defect data for matching when it is detected that fuzzy matching is currently being performed.

筛选转换模块，用于获取缺陷样本数据，并对所述缺陷样本数据进行预筛选与格式转换，得到目标样本数据；A screening and conversion module, used for acquiring defective sample data, and performing pre-screening and format conversion on the defective sample data to obtain target sample data;

数据分类模块，用于基于预设分类算法，对所述目标样本数据进行分类，得到对应于多个缺陷类别的多类别缺陷样本数据；a data classification module, configured to classify the target sample data based on a preset classification algorithm to obtain multi-category defect sample data corresponding to a plurality of defect categories;

映射建立模块，用于对所述多类别缺陷样本数据进行提取筛选操作，以获取各缺陷类别所对应的缺陷原因与修复方案，并建立缺陷类别、缺陷原因与修复方案信息三者间的映射关系；The mapping establishment module is used for extracting and screening the multi-category defect sample data to obtain the defect cause and repair scheme corresponding to each defect category, and establish the mapping relationship between the defect category, defect cause and repair scheme information ;

知识库构建模块，用于在检测到所述多类别缺陷样本数据的数据量达到预设数据量阈值时，构建所述缺陷知识库，以基于所述缺陷知识库进行缺陷自动定位分析。The knowledge base building module is configured to build the defect knowledge base when it is detected that the data volume of the multi-category defect sample data reaches a preset data volume threshold, so as to perform automatic defect location analysis based on the defect knowledge base.

所述精确匹配模块包括：The exact matching module includes:

关键提取单元，用于使用自然语言处理技术对所述待分析缺陷数据进行分词处理，以提取所述待分析缺陷数据中的关键词；A key extraction unit, used for performing word segmentation processing on the defect data to be analyzed by using natural language processing technology, so as to extract keywords in the defect data to be analyzed;

词频生成单元，用于生成所述关键词在所述待分析缺陷数据中的第一词频向量集，与所述关键词在所述缺陷知识库中的第二词频向量集；A word frequency generating unit, configured to generate a first word frequency vector set of the keyword in the defect data to be analyzed, and a second word frequency vector set of the keyword in the defect knowledge base;

精确匹配单元，用于使用余弦距离算法获取所述第一词频向量与所述第二词频向量之间的余弦相似度集合，以基于所述余弦相似度集合将所述关键词在所述缺陷知识库中进行精确匹配。An exact matching unit is used to obtain a cosine similarity set between the first word frequency vector and the second word frequency vector using a cosine distance algorithm, so as to classify the keyword in the defect knowledge based on the cosine similarity set. exact match in the library.

精确匹配判断模块，用于判断所述余弦相似度集合中是否存在超出预设相似度阈值的目标余弦相似度；an exact matching judgment module, used for judging whether there is a target cosine similarity exceeding a preset similarity threshold in the cosine similarity set;

匹配样本获取模块，用于在检测到所述余弦相似度集合中存在超出预设相似度阈值的目标余弦相似度时，获取所述缺陷知识库中与所述目标余弦相似度对应的匹配缺陷样本数据；A matching sample acquisition module, configured to acquire a matching defect sample corresponding to the target cosine similarity in the defect knowledge base when detecting that there is a target cosine similarity exceeding a preset similarity threshold in the cosine similarity set data;

结果关联显示模块，用于获取所述匹配缺陷样本数据所对应的匹配问题原因与匹配修复方案，并将所述匹配问题原因与匹配修复方案进行关联显示。The result association display module is used for acquiring the matching problem cause and the matching repair scheme corresponding to the matching defect sample data, and displaying the matching problem cause and the matching repair scheme in association.

定期统计模块，用于按照预设时间间隔，定时从所述缺陷知识库中筛选出实际重现次数超出预设次数阈值的常见缺陷数据，并确定所述常规缺陷数据对应的常见缺陷类别、常见缺陷原因与常见修复方案，以生成可视化常见缺陷统计表。The regular statistics module is used to regularly screen out the common defect data whose actual recurrence times exceed the preset times threshold from the defect knowledge base according to the preset time interval, and determine the common defect category, common defect category corresponding to the regular defect data, and common defect data. Defect causes and common fixes to generate a visual statistics table of common defects.

此外，为实现上述目的，本发明还提供一种缺陷自动定位分析设备，所述缺陷自动定位分析设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的缺陷自动定位分析程序，其中所述缺陷自动定位分析程序被所述处理器执行时，实现如上述的缺陷自动定位分析方法的步骤。In addition, in order to achieve the above object, the present invention also provides an automatic defect location analysis device, the defect automatic location analysis device includes a processor, a memory, and a defect automatic location stored in the memory and executed by the processor. A location analysis program, wherein when the defect automatic location analysis program is executed by the processor, the steps of the above-mentioned defect automatic location analysis method are implemented.

此外，为实现上述目的，本发明还提供一种计算机可读存储介质，所述计算机可读存储介质上存储有缺陷自动定位分析程序，其中所述缺陷自动定位分析程序被处理器执行时，实现如上述的缺陷自动定位分析方法的步骤。In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium on which an automatic defect location analysis program is stored, wherein when the defect automatic location analysis program is executed by a processor, the The steps of the automatic defect location analysis method as described above.

本发明提供一种缺陷自动定位分析方法、设备及计算机可读存储介质，所述缺陷自动定位分析方法通过先对待分析的缺陷数据基于文本相似度算法进行精确匹配，使得在面对缺陷知识库中已有的常见缺陷问题时能够快速匹配得到缺陷原因与修复方案；通过在精确匹配方式对当前的待分析缺陷数据无效时，进一步进行模糊匹配，先查找知识库中与缺陷数据相匹配的大类也即是缺陷类型，再对应找到与缺陷类型对应的原因与修复方案以作为模糊匹配的结果，使得未能找到精确结果时，先给出一个能在当前知识库中找到的最相近的结果；通过在得到待校验缺陷数据的准确原因与修复方案时基于此更新知识库并校准待校验缺陷数据，使得能够完善仅通过模糊匹配的缺陷数据的缺陷分析结果，并对知识库进行自适应优化，以进一步提升缺陷定位分析效率，从而解决了现有的软件缺陷定位分析方式的效率低下的技术问题。另外，与传统的查询日志、抓包分析等手工定位缺陷方法相比，所述缺陷自动定位分析方法能通过自适应优化过程实现定位结果的自行校准，在应用于软件开发测试流程中，能很好的帮助开发、测试人员快速定位程序缺陷，提供可靠有效的修复建议，实现缺陷的快速修复。尤其是程序项目规模较大，业务场景复杂，涉及多人协作开发测试时，更能显著的缩短缺陷排查定位时间，提升开发测试效率，保证软件程序的高质量和可靠性。The present invention provides an automatic defect location analysis method, device and computer-readable storage medium. The automatic defect location analysis method performs accurate matching based on a text similarity algorithm based on defect data to be analyzed first, so that in the face defect knowledge base When the existing common defect problems are found, the cause of the defect and the repair plan can be quickly matched; when the exact matching method is invalid for the current defect data to be analyzed, further fuzzy matching is performed, and the major categories that match the defect data in the knowledge base are first searched. That is, the defect type, and then correspondingly find the cause and repair plan corresponding to the defect type as the result of fuzzy matching, so that when no accurate result can be found, the closest result that can be found in the current knowledge base is given first; By updating the knowledge base and calibrating the defect data to be checked when the exact cause and repair scheme of the defect data to be verified are obtained, it is possible to improve the defect analysis results of the defect data only through fuzzy matching, and to adapt the knowledge base Optimization to further improve the efficiency of defect location analysis, thereby solving the technical problem of low efficiency of the existing software defect location analysis methods. In addition, compared with the traditional methods of manually locating defects such as query logs and packet capture analysis, the automatic defect locating and analysis method can realize self-calibration of the locating results through the adaptive optimization process, and can be applied to the software development and testing process very well. Good help developers and testers to quickly locate program defects, provide reliable and effective repair suggestions, and achieve rapid repair of defects. Especially when the scale of the program project is large, the business scenario is complex, and it involves multi-person collaborative development and testing, it can significantly shorten the time for defect investigation and positioning, improve the efficiency of development and testing, and ensure the high quality and reliability of software programs.

附图说明Description of drawings

图1为本发明实施例方案中涉及的缺陷自动定位分析设备的硬件结构示意图；Fig. 1 is the hardware structure schematic diagram of the defect automatic location analysis equipment involved in the embodiment scheme of the present invention;

图2为本发明缺陷自动定位分析方法第一实施例的流程示意图；FIG. 2 is a schematic flowchart of the first embodiment of the automatic defect location analysis method according to the present invention;

图3为本发明缺陷自动定位分析装置的功能模块示意图。FIG. 3 is a schematic diagram of functional modules of the automatic defect location analysis device of the present invention.

本发明目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization, functional characteristics and advantages of the present invention will be further described with reference to the accompanying drawings in conjunction with the embodiments.

具体实施方式Detailed ways

应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

本发明实施例涉及的缺陷自动定位分析方法主要应用于缺陷自动定位分析设备，该缺陷自动定位分析设备可以是PC、便携计算机、移动终端等具有显示和处理功能的设备。The automatic defect location analysis method involved in the embodiment of the present invention is mainly applied to an automatic defect location analysis device, which may be a device with display and processing functions, such as a PC, a portable computer, and a mobile terminal.

参照图1，图1为本发明实施例方案中涉及的缺陷自动定位分析设备的硬件结构示意图。本发明实施例中，缺陷自动定位分析设备可以包括处理器1001(例如CPU)，通信总线1002，用户接口1003，网络接口1004，存储器1005。其中，通信总线1002用于实现这些组件之间的连接通信；用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard)；网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)；存储器1005可以是高速RAM存储器，也可以是稳定的存储器(non-volatile memory)，例如磁盘存储器，存储器1005可选的还可以是独立于前述处理器1001的存储装置。Referring to FIG. 1 , FIG. 1 is a schematic diagram of the hardware structure of the automatic defect location analysis device involved in the solution of the embodiment of the present invention. In this embodiment of the present invention, the automatic defect location analysis device may include a processor 1001 (eg, a CPU), a communication bus 1002 , a user interface 1003 , a network interface 1004 , and a memory 1005 . Wherein, the communication bus 1002 is used to realize the connection and communication between these components; the user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard); the network interface 1004 may optionally include a standard wired interface, a wireless interface (such as a WI-FI interface); the memory 1005 can be a high-speed RAM memory, or a non-volatile memory, such as a disk memory, and the memory 1005 can optionally be a storage device independent of the aforementioned processor 1001 .

本领域技术人员可以理解，图1中示出的硬件结构并不构成对缺陷自动定位分析设备的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Those skilled in the art can understand that the hardware structure shown in FIG. 1 does not constitute a limitation on the automatic defect location analysis device, and may include more or less components than the one shown, or combine some components, or different components layout.

继续参照图1，图1中作为一种计算机可读存储介质的存储器1005可以包括操作系统、网络通信模块以及缺陷自动定位分析程序。Continuing to refer to FIG. 1 , the memory 1005 as a computer-readable storage medium in FIG. 1 may include an operating system, a network communication module, and an automatic defect location analysis program.

在图1中，网络通信模块主要用于连接服务器，与服务器进行数据通信；而处理器1001可以调用存储器1005中存储的缺陷自动定位分析程序，并执行本发明实施例提供的缺陷自动定位分析方法。In FIG. 1, the network communication module is mainly used to connect to the server and perform data communication with the server; and the processor 1001 can call the defect automatic location analysis program stored in the memory 1005, and execute the defect automatic location analysis method provided by the embodiment of the present invention. .

基于上述硬件结构，提出本发明缺陷自动定位分析方法的各个实施例。Based on the above hardware structure, various embodiments of the automatic defect location analysis method of the present invention are proposed.

为解决上述问题，本发明提供一种缺陷自动定位分析方法，即通过先对待分析的缺陷数据基于文本相似度算法进行精确匹配，使得在面对缺陷知识库中已有的常见缺陷问题时能够快速匹配得到缺陷原因与修复方案；通过在精确匹配方式对当前的待分析缺陷数据无效时，进一步进行模糊匹配，先查找知识库中与缺陷数据相匹配的大类也即是缺陷类型，再对应找到与缺陷类型对应的原因与修复方案以作为模糊匹配的结果，使得未能找到精确结果时，先给出一个能在当前知识库中找到的最相近的结果；通过在得到待校验缺陷数据的准确原因与修复方案时基于此更新知识库并校准待校验缺陷数据，使得能够完善仅通过模糊匹配的缺陷数据的缺陷分析结果，并对知识库进行自适应优化，以进一步提升缺陷定位分析效率，从而解决了现有的软件缺陷定位分析方式的效率低下的技术问题。另外，与传统的查询日志、抓包分析等手工定位缺陷方法相比，所述缺陷自动定位分析方法能通过自适应优化过程实现定位结果的自行校准，在应用于软件开发测试流程中，能很好的帮助开发、测试人员快速定位程序缺陷，提供可靠有效的修复建议，实现缺陷的快速修复。尤其是程序项目规模较大，业务场景复杂，涉及多人协作开发测试时，更能显著的缩短缺陷排查定位时间，提升开发测试效率，保证软件程序的高质量和可靠性。In order to solve the above problem, the present invention provides an automatic defect location analysis method, that is, by firstly performing accurate matching based on the text similarity algorithm based on the defect data to be analyzed, so that when facing the existing common defect problems in the defect knowledge base, it is possible to quickly The cause of the defect and the repair plan are obtained by matching; when the exact matching method is invalid for the current defect data to be analyzed, the fuzzy matching is further carried out. The cause and repair scheme corresponding to the defect type are used as the result of fuzzy matching, so that when the exact result cannot be found, the closest result that can be found in the current knowledge base is given first; Update the knowledge base and calibrate the defect data to be verified based on the accurate cause and repair plan, so that the defect analysis results of the defect data only through fuzzy matching can be improved, and the knowledge base can be adaptively optimized to further improve the efficiency of defect location analysis. , so as to solve the technical problem of low efficiency of the existing software defect location analysis method. In addition, compared with the traditional methods of manually locating defects such as query logs and packet capture analysis, the automatic defect locating and analysis method can realize self-calibration of the locating results through the adaptive optimization process, and can be applied to the software development and testing process very well. Good help developers and testers to quickly locate program defects, provide reliable and effective repair suggestions, and achieve rapid repair of defects. Especially when the scale of the program project is large, the business scenario is complex, and it involves multi-person collaborative development and testing, it can significantly shorten the time for defect investigation and positioning, improve the efficiency of development and testing, and ensure the high quality and reliability of software programs.

参照图2，图2为本发明缺陷自动定位分析方法第一实施例的流程示意图。Referring to FIG. 2 , FIG. 2 is a schematic flowchart of the first embodiment of the automatic defect location analysis method of the present invention.

本发明第一实施例提供一种缺陷自动定位分析方法，所述缺陷自动定位分析方法包括以下步骤：The first embodiment of the present invention provides an automatic defect location analysis method, and the defect automatic location analysis method includes the following steps:

步骤S10，获取待分析缺陷数据，并提取所述待分析缺陷数据中的关键词，以基于预设文本相似度算法将所述关键词在预设的缺陷知识库中进行精确匹配，其中，所述缺陷知识库包含多类别缺陷样本数据，以及与所述多类别缺陷样本数据对应的缺陷原因与修复方案；Step S10, acquiring defect data to be analyzed, and extracting keywords in the defect data to be analyzed, so as to accurately match the keywords in a preset defect knowledge base based on a preset text similarity algorithm, wherein the The defect knowledge base includes multi-category defect sample data, and defect causes and repair solutions corresponding to the multi-category defect sample data;

在本实施例中，待分析缺陷数据通常为软件开发人员或软件测试人员在日常工作中所遇到的缺陷数据，可从缺陷管理平台、自动化测试平台以及安全漏洞扫描工具等中获取。关键词为待分析缺陷数据中能够表示实际含义的词语，具体可先对待分析缺陷数据进行语气词、停顿词等无实际含义的无效词语的筛选，便于提高关键词的提取效率。预设文本相似度算法具体可为欧氏距离、余弦相似度、杰卡德距离、编辑距离、哈密尔顿距离等，本实施例不做具体限定。多类别缺陷样本数据为属于多种缺陷类别的缺陷样本数据，缺陷类别具体可为UI展示异常、接口报错、数据写表错误。安全漏洞等。In this embodiment, the defect data to be analyzed is usually defect data encountered by software developers or software testers in their daily work, and can be obtained from a defect management platform, an automated testing platform, and a security vulnerability scanning tool. Keywords are words that can represent actual meanings in the defect data to be analyzed. Specifically, the defect data to be analyzed can be screened for invalid words without actual meaning, such as modal particles and pause words, so as to improve the extraction efficiency of keywords. The preset text similarity algorithm may specifically be Euclidean distance, cosine similarity, Jaccard distance, edit distance, Hamiltonian distance, etc., which is not specifically limited in this embodiment. The multi-category defect sample data is the defect sample data belonging to various defect categories, and the defect category can be UI display abnormality, interface error report, and data writing table error. security breaches, etc.

本方法应用与装载有缺陷自动定位分析程序的缺陷自动定位分析系统(以下简称系统)。在用户向缺陷管理平台提交了新的缺陷数据时，系统会接收到缺陷管理平台上报的当前需要分析的缺陷数据，并触发与系统中预设的缺陷知识库的自动对比。系统会根据当前所提交的待分析缺陷的现象描述，先进行关键词提取，具体提取手段通常为自然语言处理。系统在提出待分析缺陷的现象描述的关键词后，利用该关键词与缺陷知识库中的缺陷样本数据进行精确匹配。具体的精确匹配方式通常为获取关键字与各缺陷样本数据的相似度，并将相似度与预设相似度阈值进行比较。若相似度超出阈值，则系统可判定其与待分析缺陷数据精确匹配成功；若相似度未超出阈值，则系统可判定其与待分析缺陷数据精确匹配失败。若知识库中存在多个与待分析缺陷数据的关键字之间的相似度超出阈值的情况，则选取相似度最高的缺陷样本数据作为精确匹配对象。The method is applied to a defect automatic positioning analysis system (hereinafter referred to as the system) loaded with a defect automatic positioning analysis program. When the user submits new defect data to the defect management platform, the system will receive the defect data that needs to be analyzed currently reported by the defect management platform, and trigger an automatic comparison with the preset defect knowledge base in the system. The system will first perform keyword extraction according to the currently submitted phenomenon description of the defect to be analyzed, and the specific extraction method is usually natural language processing. After the system proposes the keyword of the phenomenon description of the defect to be analyzed, it uses the keyword to accurately match the defect sample data in the defect knowledge base. The specific exact matching method is usually to obtain the similarity between the keyword and each defect sample data, and compare the similarity with a preset similarity threshold. If the similarity exceeds the threshold, the system can determine that the exact match with the defect data to be analyzed is successful; if the similarity does not exceed the threshold, the system can determine that the exact match with the defect data to be analyzed fails. If the similarity between multiple keywords of the defect data to be analyzed exceeds the threshold in the knowledge base, the defect sample data with the highest similarity is selected as the exact matching object.

步骤S20，在检测到当前精确匹配失败时，识别所述待分析缺陷数据所属的缺陷类别，并在所述缺陷知识库中查找与所述缺陷类别对应的目标缺陷原因以及目标修复方案，以完成对所述待分析缺陷数据的模糊匹配；Step S20, when it is detected that the current exact match fails, identify the defect category to which the defect data to be analyzed belongs, and search the defect knowledge base for the target defect cause and target repair scheme corresponding to the defect category, to complete the process. Fuzzy matching to the defect data to be analyzed;

在本实施例中，系统在对待分析缺陷数据进行精确分析失败时(具体可为知识库中不存在与关键词相似度超出预设阈值的多类别缺陷样本数据)，继续对其进行模糊匹配。系统将此待分析缺陷数据作为预设分类算法的输入，得到此待分析缺陷数据的所属缺陷类别。系统在缺陷知识库中找到此缺陷类别，以及与此缺陷类别对应的目标缺陷原因与目标修复方案，作为本次模糊匹配的结果。系统可直接将目标缺陷原因与目标修复方案在前端进行显示，以供用户直接获取。需要说明的是，在实际应用时，缺陷类别的识别方式可为基于分类算法的方式，或是通过将待识别缺陷数据转化为缺陷知识库的结构化查询的方式。分类算法可为支持向量机(SVM，Support Vector Machine)、随机森林、决策树、最近邻节点算法(K-NN，K-NearestNeighbor)等，本实施例不做具体限定，In this embodiment, when the accurate analysis of the defect data to be analyzed fails (specifically, there is no multi-category defect sample data whose similarity with the keyword exceeds a preset threshold in the knowledge base), the system continues to perform fuzzy matching on it. The system uses the defect data to be analyzed as the input of the preset classification algorithm, and obtains the defect category to which the defect data to be analyzed belongs. The system finds this defect category in the defect knowledge base, as well as the target defect cause and target repair plan corresponding to this defect category, as the result of this fuzzy matching. The system can directly display the target defect cause and target repair plan on the front end for users to obtain directly. It should be noted that, in practical application, the identification method of the defect category may be a method based on a classification algorithm, or a method of transforming the defect data to be identified into a structured query of the defect knowledge base. The classification algorithm may be a support vector machine (SVM, Support Vector Machine), a random forest, a decision tree, a nearest neighbor algorithm (K-NN, K-Nearest Neighbor), etc., which is not specifically limited in this embodiment.

步骤S30，将模糊匹配后的待分析缺陷数据作为待校验缺陷数据，并在获取到所述待校验缺陷数据的准确缺陷原因与准确修复方案时，将所述准确缺陷原因与准确修复方案更新至所述缺陷知识库中，以对所述待校准缺陷数据进行校准。Step S30, taking the fuzzy matching defect data to be analyzed as the defect data to be verified, and when acquiring the accurate defect cause and accurate repair scheme of the defect data to be verified, the accurate defect cause and the accurate repair scheme are obtained. The defect knowledge base is updated to calibrate the defect data to be calibrated.

在本实施例中，待校验缺陷数据为进行精确匹配失败后，进行模糊匹配得到结果的缺陷数据。In this embodiment, the defect data to be verified is defect data obtained by performing fuzzy matching after the exact matching fails.

系统将模糊匹配后的待分析缺陷数据打上待校验标识，以作为待校验缺陷数据。在缺陷验收时，系统手机待校验缺陷数据的准确缺陷原因与准确修复方案，将其与待校验缺陷数据关联并更新至缺陷知识库中，以对待校验缺陷数据进行校验。下一次模糊匹配时将优先比对知识库中的特征缺陷，从而实现自适应的定位结果校准。系统也可将已讲过校验的缺陷数据以及其对应的缺陷原因、修复方案与缺陷类别作为新的训练数据集对分类算法进行优化训练，以提升分类算法的识别准确性。同时系统会对代码进行diff定位，分析出现缺陷的最近一次提交代码，并将有变更的代码段标红，开发人员可以参考分析结果和修复建议快速完成缺陷的修复。The system marks the defect data to be analyzed after fuzzy matching with the identification to be verified as the defect data to be verified. At the time of defect acceptance, the system will associate the exact defect cause and correct repair plan of the defect data to be verified, associate it with the defect data to be verified and update it to the defect knowledge base, so as to verify the defect data to be verified. In the next fuzzy matching, the feature defects in the knowledge base will be compared first, so as to realize the self-adaptive localization result calibration. The system can also use the verified defect data and its corresponding defect causes, repair plans and defect categories as new training data sets to optimize the training of the classification algorithm to improve the recognition accuracy of the classification algorithm. At the same time, the system will diff the code, analyze the most recently submitted code with defects, and mark the changed code segments in red. Developers can refer to the analysis results and repair suggestions to quickly complete defect repairs.

作为一具体实施例，缺陷自动定位分析系统首先进行采集数据，采集日常测试发现的BUG数据，待修复后手机问题原因、修复方案，形成质量闭环；然后再建立映射，建立问题表象、问题原因与问题修复方案三者之间的映射关系，根据问题表象及分类快速定位原因，给出修复建议；再构建知识库，维护系统常见问题集，定期统计分析，识别质量隐患，共享编码经验；最后进行自适应优化，自动定位结果校准，通过自适应过程，不断优化定位准确性。As a specific embodiment, the automatic defect location analysis system first collects data, collects BUG data found in daily tests, and after repairing the cause of the mobile phone problem and repair plan, forms a closed quality loop; The mapping relationship between the three problem repair solutions, according to the problem appearance and classification, quickly locate the cause, and give repair suggestions; then build a knowledge base, maintain the system common problem set, conduct statistical analysis on a regular basis, identify quality hidden dangers, and share coding experience; finally Self-adaptive optimization, automatic positioning result calibration, continuously optimizes the positioning accuracy through the self-adaptive process.

在本实施例中，本发明通过获取待分析缺陷数据，并提取所述待分析缺陷数据中的关键词，以基于预设文本相似度算法将所述关键词在预设的缺陷知识库中进行精确匹配，其中，所述缺陷知识库包含多类别缺陷样本数据，以及与所述多类别缺陷样本数据对应的缺陷原因与修复方案；在检测到当前精确匹配失败时，识别所述待分析缺陷数据所属的缺陷类别，并在所述缺陷知识库中查找与所述缺陷类别对应的目标缺陷原因以及目标修复方案，以完成对所述待分析缺陷数据的模糊匹配；将模糊匹配后的待分析缺陷数据作为待校验缺陷数据，并在获取到所述待校验缺陷数据的准确缺陷原因与准确修复方案时，将所述准确缺陷原因与准确修复方案更新至所述缺陷知识库中，以对所述待校准缺陷数据进行校准。通过上述方式，本发明通过先对待分析的缺陷数据基于文本相似度算法进行精确匹配，使得在面对缺陷知识库中已有的常见缺陷问题时能够快速匹配得到缺陷原因与修复方案；通过在精确匹配方式对当前的待分析缺陷数据无效时，进一步进行模糊匹配，先查找知识库中与缺陷数据相匹配的大类也即是缺陷类型，再对应找到与缺陷类型对应的原因与修复方案以作为模糊匹配的结果，使得未能找到精确结果时，先给出一个能在当前知识库中找到的最相近的结果；通过在得到待校验缺陷数据的准确原因与修复方案时基于此更新知识库并校准待校验缺陷数据，使得能够完善仅通过模糊匹配的缺陷数据的缺陷分析结果，并对知识库进行自适应优化，以进一步提升缺陷定位分析效率，从而解决了现有的软件缺陷定位分析方式的效率低下的技术问题。另外，与传统的查询日志、抓包分析等手工定位缺陷方法相比，所述缺陷自动定位分析方法能通过自适应优化过程实现定位结果的自行校准，在应用于软件开发测试流程中，能很好的帮助开发、测试人员快速定位程序缺陷，提供可靠有效的修复建议，实现缺陷的快速修复。尤其是程序项目规模较大，业务场景复杂，涉及多人协作开发测试时，更能显著的缩短缺陷排查定位时间，提升开发测试效率，保证软件程序的高质量和可靠性。In this embodiment, the present invention obtains the defect data to be analyzed, and extracts the keywords in the defect data to be analyzed, so that the keywords are stored in the preset defect knowledge base based on the preset text similarity algorithm. Exact matching, wherein the defect knowledge base includes multi-category defect sample data, as well as defect causes and repair solutions corresponding to the multi-category defect sample data; when detecting that the current accurate matching fails, identify the defect data to be analyzed The defect category to which it belongs, and find the target defect cause and target repair scheme corresponding to the defect category in the defect knowledge base, so as to complete the fuzzy matching of the defect data to be analyzed; The data is used as the defect data to be verified, and when the accurate defect cause and accurate repair scheme of the to-be-verified defect data are obtained, the accurate defect cause and accurate repair scheme are updated to the defect knowledge base, so as to correct the defect knowledge base. The defect data to be calibrated is calibrated. Through the above method, the present invention performs accurate matching based on the text similarity algorithm based on the defect data to be analyzed first, so that when facing the existing common defect problems in the defect knowledge base, the defect cause and repair scheme can be quickly matched to obtain the defect cause and repair plan; When the matching method is invalid for the current defect data to be analyzed, further fuzzy matching is performed. First, find the major category that matches the defect data in the knowledge base, that is, the defect type, and then find the cause and repair plan corresponding to the defect type. The result of fuzzy matching makes it possible to first give the closest result that can be found in the current knowledge base when the exact result cannot be found; by updating the knowledge base based on the exact cause and repair scheme of the defect data to be verified And calibrate the defect data to be verified, so that the defect analysis results of the defect data only through fuzzy matching can be improved, and the knowledge base can be adaptively optimized to further improve the efficiency of defect location analysis, thus solving the existing software defect location analysis. The technical problem of inefficiency of the way. In addition, compared with the traditional methods of manually locating defects such as query logs and packet capture analysis, the automatic defect locating and analysis method can realize self-calibration of the locating results through the adaptive optimization process, and can be applied to the software development and testing process very well. Good help developers and testers to quickly locate program defects, provide reliable and effective repair suggestions, and achieve rapid repair of defects. Especially when the scale of the program project is large, the business scenario is complex, and it involves multi-person collaborative development and testing, it can significantly shorten the time for defect investigation and positioning, improve the efficiency of development and testing, and ensure the high quality and reliability of software programs.

进一步地，基于上述图2所示的第一实施例，提出本发明缺陷自动定位分析方法的第二实施例。本实施例中，步骤S20包括：Further, based on the above-mentioned first embodiment shown in FIG. 2 , a second embodiment of the automatic defect location analysis method of the present invention is proposed. In this embodiment, step S20 includes:

在本实施例中，系统在检测到当前精确匹配失败时(例如余弦相似度最高值低于预设相似度阈值)，使用基于长短时记忆(LSTM，Long Short－Term Memory)－条件随机场(CRF，conditional random field)的实体识别模型可以从待分析缺陷数据的自然语言部分中识别出实体部分，可以将自然语句中的实体映射到对应的概念上，而且将自然语言概念化可以辅助模型学习到更精准的语义信息。系统可将缺陷知识库转化为知识图谱，通过知识图谱中的属性(关系类型)进行属性扩展获取得到更多的关系表达。通过对实体概念映射以及对知识库图谱的属性扩展从而获取得到高质量的问题模板。In this embodiment, when the system detects that the current exact match fails (for example, the highest value of the cosine similarity is lower than the preset similarity threshold), the system uses a Long Short-Term Memory (LSTM, Long Short-Term Memory)-conditional random field ( The entity recognition model of CRF, conditional random field) can identify the entity part from the natural language part of the defect data to be analyzed, and can map the entity in the natural sentence to the corresponding concept, and the natural language conceptualization can assist the model to learn More precise semantic information. The system can convert the defect knowledge base into a knowledge graph, and obtain more relational expressions by expanding the attributes (relationship types) in the knowledge graph. High-quality question templates are obtained by mapping entity concepts and extending attributes of knowledge base graphs.

系统先对待识别缺陷数据进行多层次语义解析，再使用预设的概率图模型综合使用问题的语义解析结果，以及通过获取到的问题模板，预测出待识别缺陷数据对应到知识图谱中的属性类型，最后根据得到的属性类型以及模型识别出的待识别缺陷数据中的实体等信息，将待识别缺陷数据转换成知识图谱的结构化查询，从缺陷知识库中查询得到当前与待识别缺陷数据最匹配符的目标缺陷原因与目标修复方案，以完成对所述待分析缺陷数据的模糊匹配。The system first performs multi-level semantic analysis of the defect data to be identified, and then uses the preset probability graph model to comprehensively use the semantic analysis results of the problem, and through the obtained problem template, predicts the attribute type corresponding to the defect data to be identified in the knowledge graph. Finally, according to the obtained attribute type and the entity and other information in the defect data to be identified identified by the model, the defect data to be identified is converted into a structured query of the knowledge graph, and the current defect data closest to the defect data to be identified is obtained by querying the defect knowledge base. The target defect cause of the matcher and the target repair scheme are used to complete the fuzzy matching of the defect data to be analyzed.

进一步地，步骤S30之后，还包括：Further, after step S30, it also includes:

本实施例中，系统将校准后的待校准缺陷数据缺陷打上特征标识作为特征缺陷数据，在下一次模糊匹配时将优先比对知识库中的特征缺陷数据，从而实现自适应的定位结果校准。同时系统会对代码进行diff定位，分析出现缺陷的最近一次提交代码，并将有变更的代码段标红，开发同事可以参考分析结果和修复建议快速完成缺陷的修复。In this embodiment, the system marks the defects of the calibrated defect data with a feature mark as the feature defect data, and will preferentially compare the feature defect data in the knowledge base in the next fuzzy matching, thereby realizing self-adaptive positioning result calibration. At the same time, the system will diff the code, analyze the most recently submitted code with defects, and mark the changed code segments in red. Development colleagues can refer to the analysis results and repair suggestions to quickly complete defect repairs.

需要强调的是，为进一步保证上述特征缺陷数据的私密和安全性，上述特征缺陷数据还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned characteristic defect data, the above-mentioned characteristic defect data may also be stored in a node of a blockchain.

进一步地，步骤S10之前，还包括：Further, before step S10, it also includes:

在本实施例中，缺陷样本数据可从缺陷管理平台、自动化测试平台以及安全漏洞扫描工具中采集得到。格式转换操作的目的是将可能属于不同类型的缺陷样本数据转换为统一格式。预筛选方式具体可为根据样本缺陷卡片中的特定标识进行筛选。转换之后的统一格式具体可为JSON格式、字符串格式等。预设分类算法具体可为可为支持向量机(SVM，Support Vector Machine)、随机森林、决策树、最近邻节点算法(K-NN，K-NearestNeighbor)等。In this embodiment, defect sample data may be collected from a defect management platform, an automated testing platform, and a security vulnerability scanning tool. The purpose of the format conversion operation is to convert defect sample data, which may belong to different types, into a uniform format. Specifically, the pre-screening method may be based on the specific identification in the sample defect card. The unified format after the conversion may specifically be a JSON format, a string format, or the like. The preset classification algorithm may specifically be a support vector machine (SVM, Support Vector Machine), a random forest, a decision tree, a nearest neighbor algorithm (K-NN, K-Nearest Neighbor), and the like.

具体地，系统在前期构建缺陷知识库时，先从缺陷管理平台、自动化测试平台、安全漏洞扫描工具中采集缺陷样本数据，维护基本的缺陷信息，包括缺陷现象描述、问题产生的原因，问题修复的方法，其中问题原因和修复方法初始阶段需要在缺陷卡片验收时人工维护；数据采集完成后进入到处理层，首先经过预处理，筛除一些如测试误报，或实际并非缺陷的无效数据，然后把不同平台采集到的缺陷数据进行转换成统一格式，接着通过KNN分类算法处理，将缺陷筛选分为一些大类，如UI展示异常、接口报错、数据写表错误、安全漏洞等；数据处理完成后到达分析层，在这里会对采集的缺陷数据进行分析，建立每个缺陷表象、问题原因，解决方法三者之间的映射关系，同时维护在知识库中；当知识库中的缺陷样本数据维护到一定规模，就可以将此时的知识库作为可直接使用的缺陷知识库，执行自动定位分析流程。Specifically, when the system builds a defect knowledge base in the early stage, it first collects defect sample data from defect management platforms, automated testing platforms, and security vulnerability scanning tools, and maintains basic defect information, including defect descriptions, causes of problems, and problem fixes. method, in which the cause of the problem and the repair method need to be manually maintained during the acceptance of the defective card at the initial stage; after the data collection is completed, it enters the processing layer, which is first preprocessed to screen out some invalid data such as false positives in the test or not actually defects. Then, the defect data collected by different platforms is converted into a unified format, and then processed by the KNN classification algorithm, and the defect screening is divided into some major categories, such as UI display exception, interface error, data writing table error, security loophole, etc.; data processing After completion, it will reach the analysis layer, where the collected defect data will be analyzed, and the mapping relationship between each defect appearance, problem cause, and solution will be established, and maintained in the knowledge base at the same time; when the defect samples in the knowledge base are When the data is maintained to a certain scale, the knowledge base at this time can be used as a defect knowledge base that can be used directly, and the automatic location analysis process can be performed.

在本实施例中，预设时间间隔为用于定期统计常见缺陷的时长，预设次数阈值为判断缺陷是否常见的临界值，均可根据实际需求灵活设定与更改，例如，可取一周、半个月等。系统安装预设时间间隔，如一个月，每月生成常见缺陷的可视化报表，可以定期统计分析常见缺陷的归类、问题原因、重现次数、修复建议，更好的识别软件项目的质量隐患，针对性的采取措施解决，并对一些反复多次出现的同类型缺陷重点标识。In this embodiment, the preset time interval is a time period for regularly counting common defects, and the preset number of times threshold is a critical value for judging whether a defect is common, which can be flexibly set and changed according to actual needs, for example, a week, half a month and so on. The system is installed at a preset time interval, such as one month, and a visual report of common defects is generated every month. It can regularly analyze the classification of common defects, the cause of the problem, the number of recurrences, and the repair suggestions, so as to better identify the hidden quality risks of software projects. Targeted measures are taken to solve them, and some recurring defects of the same type are highlighted.

进一步地，通过融合问题模板以及多层次的语义解析技术，可以有效的降低模型的复杂度，并且可以深层次理解待分析缺陷数据，提高了模型的精度；通过将进行模糊匹配的缺陷数据打上特征标识，下一次模糊匹配时将优先比对知识库中的特征缺陷，从而实现自适应的定位结果校准；通过预先构建缺陷知识库，并在使用过程中不断补充优化该缺陷知识库，使得能够实现缺陷修复经验的积累，形成闭环管理，不断提高基于缺陷知识库所进行的缺陷自动定位分析的效率；通过定期生成常见缺陷问题的统计表，使得能够缺陷自动定位过程中的代表性结果进行定期展示，便于针对性的采取措施解决，逐步避免常见问题的出现。Further, by integrating the problem template and multi-level semantic analysis technology, the complexity of the model can be effectively reduced, and the defect data to be analyzed can be deeply understood, and the accuracy of the model can be improved; by marking the defect data for fuzzy matching with features identification, the feature defects in the knowledge base will be preferentially compared in the next fuzzy matching, so as to realize the self-adaptive positioning result calibration; The accumulation of defect repair experience forms closed-loop management, and continuously improves the efficiency of automatic defect location analysis based on the defect knowledge base; by regularly generating a statistical table of common defect problems, the representative results in the process of automatic defect location can be displayed on a regular basis , it is convenient to take targeted measures to solve and gradually avoid the occurrence of common problems.

进一步地，基于上述图2所示的第一实施例，提出本发明缺陷自动定位分析方法的第三实施例。本实施例中，所述文本相似度算法包括余弦距离算法，步骤S10包括：Further, based on the first embodiment shown in FIG. 2 above, a third embodiment of the automatic defect location analysis method of the present invention is proposed. In this embodiment, the text similarity algorithm includes a cosine distance algorithm, and step S10 includes:

在本实施例中，基于自然语言处理技术，首先把给定的文本(待分析缺陷数据中的文字部分)按照完整句子进行分割，即：对于每个句子，进行分词和词性标注处理，并过滤掉停用词,只保留指定词性的单词,如名词、动词等，构建候选关键词图G＝(V,E)，其中V为节点集,由候选关键词组成；然后根据TextRank(一种关键词提取算法)的公式(TextRank是受到Google的PageRank的启发，通过把文本分割成若干组成单元(单词、句子)并建立图模型,利用投票机制对文本中的重要成分进行排序,仅利用单篇文档本身的信息即可实现关键词提取)，迭代传播各节点的权重，直至收敛。最后，对节点权重进行倒序排序,从而得到最重要的T个单词,作为候选关键词。In this embodiment, based on natural language processing technology, the given text (the text part in the defect data to be analyzed) is firstly segmented according to complete sentences, that is, for each sentence, word segmentation and part-of-speech tagging are performed, and filtering is performed. Remove stop words, keep only words with specified part of speech, such as nouns, verbs, etc., construct candidate keyword graph G=(V, E), where V is the node set, which is composed of candidate keywords; then according to TextRank (a key Word Extraction Algorithm) formula (TextRank is inspired by Google's PageRank, by dividing the text into several constituent units (words, sentences) and establishing a graph model, using the voting mechanism to sort the important components in the text, only using a single article The information of the document itself can realize keyword extraction), iteratively propagate the weight of each node until convergence. Finally, the node weights are sorted in reverse order to obtain the most important T words as candidate keywords.

在本实施例中，第一词频向量集为，当前提取出的关键词在待分析缺陷数据中的词频向量的集合，第二词频向量为当前提取出的关键词在缺陷知识库中的词频向量的集合。系统基于余弦距离计算原理，将当前提取出的关键词作为一个关键词集合，分别计算两方对于这个关键词集合中关键词的词频，生成两方各自的词频向量(第一词频向量与第二词频向量)；最后再计算两个向量集合的余弦相似度，将各个向量的余弦相似度值合并至一个集合，余弦相似度越大就表示越相似。系统可将知识库中相似度最高(高于预设相似度阈值)的缺陷样本数据所对应的问题原因与修复方案作为当前的精确匹配结果。In this embodiment, the first word frequency vector set is the set of word frequency vectors of the currently extracted keywords in the defect data to be analyzed, and the second word frequency vector is the word frequency vectors of the currently extracted keywords in the defect knowledge base collection. Based on the cosine distance calculation principle, the system takes the currently extracted keywords as a keyword set, respectively calculates the word frequencies of the two parties for the keywords in this keyword set, and generates their respective word frequency vectors (the first word frequency vector and the second word frequency vector). word frequency vector); finally, calculate the cosine similarity of the two vector sets, and combine the cosine similarity values of each vector into a set. The greater the cosine similarity, the more similar. The system can take the problem cause and repair plan corresponding to the defect sample data with the highest similarity (higher than the preset similarity threshold) in the knowledge base as the current exact matching result.

进一步地，所述使用余弦距离算法获取所述第一词频向量与所述第二词频向量之间的余弦相似度集合，以基于所述余弦相似度集合将所述关键词在所述缺陷知识库中进行精确匹配的步骤之后，还包括：Further, the cosine similarity set between the first word frequency vector and the second word frequency vector is obtained by using the cosine distance algorithm, so as to put the keyword in the defect knowledge base based on the cosine similarity set. After the exact matching steps in , it also includes:

在本实施例中，预设相似度阈值可根据实际需求灵活设置，本实施例不作具体限定。系统判断余弦相似度集合中是否存在超出预设相似度阈值的目标余弦相似度，若余弦相似度集合中存在超出预设相似度阈值的目标余弦相似度，则判定当前精确匹配成功；若余弦相似度集合中不存在超出预设相似度阈值的目标余弦相似度，也即是各个词频向量的余弦相似度均低于预设相似度阈值，则判定当前精确匹配失败，继而可转向模糊匹配。In this embodiment, the preset similarity threshold can be flexibly set according to actual requirements, which is not specifically limited in this embodiment. The system determines whether there is a target cosine similarity that exceeds the preset similarity threshold in the cosine similarity set, and if there is a target cosine similarity that exceeds the preset similarity threshold in the cosine similarity set, it is determined that the current exact match is successful; if the cosine similarity is similar If there is no target cosine similarity exceeding the preset similarity threshold in the degree set, that is, the cosine similarity of each word frequency vector is lower than the preset similarity threshold, it is determined that the current exact matching fails, and then the fuzzy matching can be turned.

进一步地，步骤S10之后，还包括：Further, after step S10, it also includes:

在本实施例中，系统在检测到余弦相似度集合中存在超出预设相似度阈值的目标余弦相似度时时，直接获取缺陷知识库中与目标余弦相似度对应的匹配缺陷样本数据，并获取匹配缺陷样本数据在知识库中所对应的匹配问题原因以及匹配修复方案，将其直接在前端关联显示，以便用户直接进行查看。In this embodiment, when detecting that there is a target cosine similarity in the cosine similarity set that exceeds the preset similarity threshold, the system directly obtains the matching defect sample data corresponding to the target cosine similarity in the defect knowledge base, and obtains the matching defect sample data. The cause of the matching problem and the matching repair scheme corresponding to the defect sample data in the knowledge base are directly displayed in the front end, so that users can view them directly.

本发明所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain)，本质上是一个去中心化的数据库，是一串使用密码学方法相关联产生的数据块，每一个数据块中包含了一批次网络交易的信息，用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in the present invention is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

进一步地，通过采用自然语言处理技术与余弦距离原理来进行精确匹配，使得精确匹配过程更加准确高效；通过设置相似度阈值来判定精确匹配的成功与否，使得精确匹配的结果判定更加简单易行；通过在精确匹配成功是将匹配结果进行关联显示，用户直接进行查看，提升了用户体验。Further, by using natural language processing technology and the principle of cosine distance to perform precise matching, the precise matching process is more accurate and efficient; by setting a similarity threshold to determine the success of precise matching, it makes the determination of precise matching results easier and more feasible. ; By correlating the matching results when the exact match is successful, users can view them directly, which improves the user experience.

此外，如图3所示，为实现上述目的，本发明还提供一种缺陷自动定位分析装置，所述缺陷自动定位分析装置包括：In addition, as shown in FIG. 3 , in order to achieve the above purpose, the present invention also provides an automatic defect positioning and analysis device, and the automatic defect positioning and analysis device includes:

精确匹配模块10，用于获取待分析缺陷数据，并提取所述待分析缺陷数据中的关键词，以基于预设文本相似度算法将所述关键词在预设的缺陷知识库中进行精确匹配，其中，所述缺陷知识库包含多类别缺陷样本数据，以及与所述多类别缺陷样本数据对应的缺陷原因与修复方案；The exact matching module 10 is used for acquiring defect data to be analyzed, and extracting keywords in the defect data to be analyzed, so as to accurately match the keywords in a preset defect knowledge base based on a preset text similarity algorithm , wherein the defect knowledge base includes multi-category defect sample data, and defect causes and repair schemes corresponding to the multi-category defect sample data;

模糊匹配模块20，用于在检测到当前精确匹配失败时，识别所述待分析缺陷数据所属的缺陷类别，并在所述缺陷知识库中查找与所述缺陷类别对应的目标缺陷原因以及目标修复方案，以完成对所述待分析缺陷数据的模糊匹配；The fuzzy matching module 20 is used for identifying the defect category to which the defect data to be analyzed belongs when the current exact matching failure is detected, and searching the defect knowledge base for the target defect cause and target repair corresponding to the defect category scheme to complete the fuzzy matching of the defect data to be analyzed;

缺陷校准模块30，用于将模糊匹配后的待分析缺陷数据作为待校验缺陷数据，并在获取到所述待校验缺陷数据的准确缺陷原因与准确修复方案时，将所述准确缺陷原因与准确修复方案更新至所述缺陷知识库中，以对所述待校准缺陷数据进行校准。The defect calibration module 30 is configured to use the fuzzy matching defect data to be analyzed as the defect data to be verified, and when acquiring the accurate defect cause and accurate repair plan of the defect data to be verified, the accurate defect cause The defect knowledge base is updated with an accurate repair plan, so as to calibrate the defect data to be calibrated.

可选地，所述模糊匹配模块10包括：Optionally, the fuzzy matching module 10 includes:

所述精确匹配模块20包括：The exact matching module 20 includes:

可选地，所述缺陷自动定位分析装置还包括：Optionally, the defect automatic positioning analysis device further includes:

定期统计模块，用于按照预设时间间隔，定时从所述缺陷知识库中筛选出实际重现次数超出预设次数阈值的常见缺陷数据，并确定所述常规缺陷数据对应的常见缺陷类别、常见缺陷原因与常见修复方案，以生成可视化常见缺陷统计表。The regular statistics module is used to regularly screen out the common defect data whose actual recurrence times exceed the preset times threshold from the defect knowledge base according to the preset time interval, and determine the common defect category, common defect category corresponding to the regular defect data, and common defect data. Defect causes and common fixes to generate a visual common defect statistics table.

本发明还提供一种缺陷自动定位分析设备。The invention also provides a defect automatic positioning analysis device.

所述缺陷自动定位分析设备包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的缺陷自动定位分析程序，其中所述缺陷自动定位分析程序被所述处理器执行时，实现如上所述的缺陷自动定位分析方法的步骤。The automatic defect location analysis device includes a processor, a memory, and an automatic defect location analysis program stored on the memory and executable on the processor, wherein the automatic defect location analysis program is executed by the processor. , to realize the steps of the automatic defect location analysis method as described above.

其中，所述缺陷自动定位分析程序被执行时所实现的方法可参照本发明缺陷自动定位分析方法的各个实施例，此处不再赘述。Wherein, for the method implemented when the automatic defect location analysis program is executed, reference may be made to the various embodiments of the automatic defect location analysis method of the present invention, which will not be repeated here.

此外，本发明实施例还提供一种计算机可读存储介质。In addition, an embodiment of the present invention further provides a computer-readable storage medium.

本发明计算机可读存储介质上存储有缺陷自动定位分析程序，其中所述缺陷自动定位分析程序被处理器执行时，实现如上述的缺陷自动定位分析方法的步骤。The computer-readable storage medium of the present invention stores an automatic defect location analysis program, wherein when the defect automatic location analysis program is executed by the processor, the steps of the above-mentioned method for automatic defect location analysis are implemented.

其中，缺陷自动定位分析程序被执行时所实现的方法可参照本发明缺陷自动定位分析方法的各个实施例，此处不再赘述。The method implemented when the automatic defect location analysis program is executed may refer to the various embodiments of the automatic defect location analysis method of the present invention, which will not be repeated here.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or system comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or system. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system that includes the element.

上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art. The computer software products are stored in a storage medium (such as ROM/RAM) as described above. , magnetic disk, optical disk), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present invention.

以上仅为本发明的优选实施例，并非因此限制本发明的专利范围，凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本发明的专利保护范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present invention, or directly or indirectly applied in other related technical fields , are similarly included in the scope of patent protection of the present invention.

Claims

1. An automatic defect positioning and analyzing method is characterized by comprising the following steps:

acquiring defect data to be analyzed, extracting keywords in the defect data to be analyzed, and accurately matching the keywords in a preset defect knowledge base based on a preset text similarity algorithm, wherein the defect knowledge base comprises multi-class defect sample data, and defect reasons and repair schemes corresponding to the multi-class defect sample data;

when the current accurate matching is detected to fail, identifying the defect type to which the defect data to be analyzed belongs, and searching a target defect reason and a target repair scheme corresponding to the defect type in the defect knowledge base to complete fuzzy matching of the defect data to be analyzed;

and taking the fuzzy-matched defect data to be analyzed as defect data to be verified, and updating the accurate defect reason and the accurate repair scheme into the defect knowledge base when the accurate defect reason and the accurate repair scheme of the defect data to be verified are obtained so as to calibrate the defect data to be calibrated.

2. The method according to claim 1, wherein the step of identifying the defect type to which the defect data to be analyzed belongs when detecting that the current exact match fails, and searching the defect knowledge base for a target defect cause and a target repair solution corresponding to the defect type to complete the fuzzy match of the defect data to be analyzed comprises:

when the current accurate matching is detected to fail, recognizing entity information of the defect data to be analyzed by using a preset entity recognition model, and obtaining the problem template based on the entity information;

performing multi-level semantic analysis on the defect data to be analyzed to obtain multi-level semantics of the defect data to be analyzed;

predicting the defect category of the defect data to be analyzed corresponding to the defect knowledge base by using a preset probability map model and combining the problem template and the multi-level semantics;

and converting the defect data to be analyzed into structured query of the defect knowledge base according to the defect category and the entity information, and querying to obtain the target defect reason and a target repair scheme so as to complete fuzzy matching of the defect data to be analyzed.

3. The method according to claim 1, wherein the step of using the fuzzy-matched defect data to be analyzed as defect data to be verified, and updating the accurate defect reason and the accurate repair plan into the defect knowledge base when acquiring the accurate defect reason and the accurate repair plan of the defect data to be verified, so as to calibrate the defect data to be calibrated further comprises:

performing characteristic marking on the calibrated defect data to be calibrated to serve as characteristic defect data;

and when the fuzzy matching is detected to be currently performed, preferentially selecting the characteristic defect data for matching.

4. The method for automatically locating and analyzing the defects according to claim 1, wherein before the step of obtaining the defect data to be analyzed and extracting the keywords from the defect data to be analyzed, the method further comprises:

acquiring defect sample data, and performing pre-screening and format conversion on the defect sample data to obtain target sample data;

classifying the target sample data based on a preset classification algorithm to obtain multi-class defect sample data corresponding to a plurality of defect classes;

extracting and screening the multi-class defect sample data to obtain defect reasons and repair schemes corresponding to the defect classes, and establishing a mapping relation among the defect classes, the defect reasons and the repair scheme information;

and when the data volume of the multi-class defect sample data is detected to reach a preset data volume threshold value, constructing the defect knowledge base so as to perform automatic defect positioning analysis based on the defect knowledge base.

5. The method of automatic defect localization analysis according to claim 1, wherein said text similarity algorithm comprises a cosine distance algorithm,

the step of obtaining the defect data to be analyzed, extracting the keywords in the defect data to be analyzed, and accurately matching the keywords in a preset defect knowledge base based on a preset text similarity algorithm comprises the following steps:

performing word segmentation processing on the to-be-analyzed defect data by using a natural language processing technology to extract keywords in the to-be-analyzed defect data;

generating a first word frequency vector set of the keywords in the defect data to be analyzed and a second word frequency vector set of the keywords in the defect knowledge base;

and a cosine distance algorithm is used for obtaining a cosine similarity set between the first word frequency vector and the second word frequency vector, so that the keywords are accurately matched in the defect knowledge base based on the cosine similarity set.

6. The method of claim 5, wherein after the step of obtaining the cosine similarity set between the first word frequency vector and the second word frequency vector by using a cosine distance algorithm to precisely match the keyword in the defect knowledge base based on the cosine similarity set, the method further comprises:

judging whether a target cosine similarity exceeding a preset similarity threshold exists in the cosine similarity set or not;

if so, judging that the current accurate matching is successful;

if not, judging that the current accurate matching fails.

7. The method for automatically locating and analyzing defects according to claim 6, wherein after the steps of obtaining the data of the defects to be analyzed and extracting the keywords from the data of the defects to be analyzed to precisely match the keywords in a preset defect knowledge base based on a preset text similarity algorithm, the method further comprises:

when the cosine similarity set is detected to have target cosine similarity exceeding a preset similarity threshold, acquiring matched defect sample data corresponding to the target cosine similarity in the defect knowledge base;

and acquiring a matching problem reason and a matching repair scheme corresponding to the matching defect sample data, and displaying the matching problem reason and the matching repair scheme in a correlation manner.

8. The method according to any one of claims 1 to 7, wherein the step of using the fuzzy-matched defect data to be analyzed as defect data to be verified, and updating the accurate defect reason and the accurate repair solution into the defect knowledge base when acquiring the accurate defect reason and the accurate repair solution of the defect data to be verified so as to calibrate the defect data to be calibrated further comprises:

and regularly screening common defect data with the actual reproduction times exceeding a preset time threshold from the defect knowledge base according to a preset time interval, and determining common defect types, common defect reasons and common repair schemes corresponding to the common defect data to generate a visual common defect statistical table.

9. A defect automatic localization analysis device, characterized in that the defect automatic localization analysis device comprises a processor, a memory, and a defect automatic localization analysis program stored on the memory and executable by the processor, wherein the defect automatic localization analysis program when executed by the processor implements the steps of the defect automatic localization analysis method according to any one of claims 1 to 8.

10. A computer-readable storage medium, wherein a defect automatic location analysis program is stored on the computer-readable storage medium, wherein the defect automatic location analysis program, when executed by a processor, implements the steps of the defect automatic location analysis method according to any one of claims 1 to 8.