CN115687979A - Identification method and device, electronic equipment, and storage medium of specified technology in threat intelligence - Google Patents
Identification method and device, electronic equipment, and storage medium of specified technology in threat intelligence Download PDFInfo
- Publication number
- CN115687979A CN115687979A CN202211387653.0A CN202211387653A CN115687979A CN 115687979 A CN115687979 A CN 115687979A CN 202211387653 A CN202211387653 A CN 202211387653A CN 115687979 A CN115687979 A CN 115687979A
- Authority
- CN
- China
- Prior art keywords
- paragraph
- word
- technology
- specified
- threat intelligence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Machine Translation (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本申请涉及网络安全技术领域,特别涉及一种威胁情报中指定技术的识别方法及装置、电子设备、计算机可读存储介质。The present application relates to the technical field of network security, and in particular to a method and device for identifying a specified technology in threat intelligence, electronic equipment, and a computer-readable storage medium.
背景技术Background technique
威胁情报被定义为“基于证据的知识,包括背景、机制、指标、影响和可采取行为的建议,这些知识与现有或新出现的威胁或资产危害有关,可用于告知决策主体对该威胁或危害的反应”。网络安全领域的威胁情报,或网络威胁情报,能够及时提供相关信息,如攻击的特征,有助于减少识别潜在安全漏洞和攻击的不确定性。个人或企业可以从社交媒体(例如,博客)、供应商(Microsoft、Cisco等)公告、黑客论坛等渠道获取网络威胁情报。Threat intelligence is defined as "evidence-based knowledge, including context, mechanisms, indicators, implications, and recommendations for actionable, Hazardous Response". Threat intelligence in the field of cybersecurity, or cyber threat intelligence, can provide relevant information in a timely manner, such as the characteristics of an attack, and help reduce the uncertainty in identifying potential security vulnerabilities and attacks. Individuals or businesses can obtain cyber threat intelligence from social media (e.g., blogs), vendor (Microsoft, Cisco, etc.) announcements, hacker forums, and more.
然而,网络威胁情报的格式并不固定,对于所涉及的技术可能有标准的标识,也不可能仅有描述性说明而欠缺标准的标识。例如:对于“Sudo and Sudo Caching”技术,网络威胁情报中可能有直接说出技术名称的表示形式“T1548.003 Sudo and Sudo Caching”,也有使用文本进行描述的表示形式“Adversaries may perform sudo caching and/oruse the sudoers file to elevate privileges.Adversaries may do this to executecommands as other users or spawn processes with higher privileges”。However, the format of cyber threat intelligence is not fixed, and there may be standard identifications for the technologies involved, and it is impossible to only have descriptive instructions without standard identifications. For example: For the "Sudo and Sudo Caching" technique, Cyber Threat Intelligence may have the representation "T1548.003 Sudo and Sudo Caching" that directly states the name of the technique, or the representation that uses text to describe it "Adversaries may perform sudo caching and /or use the sudoers file to elevate privileges. Adversaries may do this to execute commands as other users or spawn processes with higher privileges".
对于网络威胁情报的使用者(个人或企业)而言,可能存在部分需要特别关注的技术,以借助这些技术提高抵御网络威胁的能力。因此,亟需一种能够从网络威胁情报中准确识别指定技术的方案。For users of cyber threat intelligence (individuals or enterprises), there may be some technologies that need special attention, so as to use these technologies to improve the ability to resist cyber threats. Therefore, there is an urgent need for a solution that can accurately identify specified technologies from cyber threat intelligence.
发明内容Contents of the invention
本申请实施例的目的在于提供一种威胁情报中指定技术的识别方法及装置、电子设备、计算机可读存储介质,用于从网络威胁情报中准确识别出有关指定技术的内容。The purpose of the embodiments of the present application is to provide a method and device for identifying specified technologies in threat intelligence, electronic equipment, and a computer-readable storage medium, for accurately identifying content related to specified technologies from network threat intelligence.
一方面,本申请提供了一种威胁情报中指定技术的识别方法,包括:In one aspect, the present application provides a method for identifying specified technologies in threat intelligence, including:
对网络威胁情报进行预处理,得到所述网络威胁情报中每一段落对应的词语序列;Preprocessing the cyber threat intelligence to obtain a sequence of words corresponding to each paragraph in the cyber threat intelligence;
针对每一段落对应的词语序列,为所述词语序列添加词语掩码后,输入已训练的完形填空模型,获得所述完形填空模型输出的对应于所述词语掩码的预测词语;For the word sequence corresponding to each paragraph, after adding a word mask for the word sequence, input the trained cloze model to obtain the predicted words corresponding to the word mask output by the cloze model;
针对每一段落对应的词语序列,将所述词语序列输入至已训练的技术分类模型,获得所述技术分类模型输出的多种预测类别以及每一预测类别对应的置信度,并选择置信度靠前的若干种预测类别,作为所述段落对应的目标预测类别;其中,每一预测类别指示一种属于指定技术的技术名称;For the word sequence corresponding to each paragraph, input the word sequence into the trained technology classification model, obtain the multiple prediction categories output by the technology classification model and the confidence corresponding to each prediction category, and select the highest confidence level Several prediction categories of , as the target prediction category corresponding to the paragraph; wherein, each prediction category indicates a technology name belonging to a specified technology;
针对每一段落,判断是否存在所述段落对应的任一目标预测类别,包括所述段落对应的预测词语;For each paragraph, determine whether there is any target prediction category corresponding to the paragraph, including the prediction words corresponding to the paragraph;
根据每一段落对应的判断结果,确定所述段落是否包括指定技术。According to the judgment result corresponding to each paragraph, it is determined whether the paragraph includes the specified technology.
通过上述措施,将网络威胁情报拆分出多个段落后,借助完形填空模型和技术分类模型针对各个段落分别进行指定技术的识别,从而准确识别出存在指定技术相关内容的段落。Through the above measures, after the network threat intelligence is split into multiple paragraphs, the specified technology is identified for each paragraph with the help of the cloze model and the technology classification model, so as to accurately identify the paragraphs with content related to the specified technology.
在一实施例中,在所述对网络威胁情报进行预处理,得到所述网络威胁情报中每一段落对应的词语序列之前,所述方法还包括:In one embodiment, before the preprocessing of the cyber threat intelligence to obtain the word sequence corresponding to each paragraph in the cyber threat intelligence, the method further includes:
对所述网络威胁情报,以所述指定技术下多个技术名称进行正则匹配,判断能否匹配到任一技术名称;For the network threat intelligence, perform regular matching with multiple technical names under the specified technology, and determine whether any technical name can be matched;
如果匹配到任一技术名称,确定所述网络威胁情报包括所述指定技术;If any technology name is matched, determining that the cyber threat intelligence includes the specified technology;
如果无法匹配到任一技术名称,继续执行所述对网络威胁情报进行预处理的步骤。If no technical name can be matched, proceed to the steps described in Preprocessing Cyber Threat Intelligence.
通过上述措施,可以在网络威胁情报包含指定技术下的技术名称的情况下,快速识别出网络威胁情报中的指定技术,从而降低了识别任务的工作量。Through the above measures, the specified technology in the network threat intelligence can be quickly identified when the network threat intelligence contains the technical name under the specified technology, thereby reducing the workload of the identification task.
在一实施例中,所述对网络威胁情报进行预处理,得到所述网络威胁情报中每一段落对应的词语序列,包括:In one embodiment, the preprocessing of the cyber threat intelligence to obtain the word sequence corresponding to each paragraph in the cyber threat intelligence includes:
将所述网络威胁情报划分为若干段落;Divide said cyber threat intelligence into paragraphs;
针对每一段落进行分词,并从分词结果中滤除停用词和无效词;Perform word segmentation for each paragraph, and filter out stop words and invalid words from word segmentation results;
针对每一段落,对经过滤除处理的分词结果,进行词干提取,得到所述段落对应的词语序列。For each paragraph, stem extraction is performed on the filtered word segmentation result to obtain a word sequence corresponding to the paragraph.
通过上述措施,可以将网络威胁情报处理为若干段落对应的词语序列。Through the above measures, the network threat intelligence can be processed into word sequences corresponding to several paragraphs.
在一实施例中,所述完形填空模型通过如下方式训练得到:In one embodiment, the cloze model is trained as follows:
针对样本数据集中的样本语料,以词语掩码替换所述样本语料中的至少一个词语,得到指定样本语料;For the sample corpus in the sample data set, replace at least one word in the sample corpus with a word mask to obtain a specified sample corpus;
将所述指定样本语料输入预训练模型,获得所述指定样本语料中词语掩码对应的样本预测结果;Inputting the specified sample corpus into the pre-training model to obtain a sample prediction result corresponding to the word mask in the specified sample corpus;
根据所述指定样本语料中词语掩码对应的样本预测结果和被替换词语,对所述预训练模型的模型参数进行调整,得到完形填空模型。According to the sample prediction results corresponding to the word masks in the specified sample corpus and the replaced words, the model parameters of the pre-trained model are adjusted to obtain a cloze model.
通过上述措施,可以训练得到完形填空模型。Through the above measures, the cloze model can be trained.
在一实施例中,所述样本语料包括技术名称和技术描述;In one embodiment, the sample corpus includes technical names and technical descriptions;
所述以词语掩码替换所述样本语料中的至少一个词语,包括:The replacing at least one word in the sample corpus with a word mask includes:
从所述样本语料所包含的技术名称中选择一个词语,替换为词语掩码;和/或,select a word from the technical names contained in the sample corpus, and replace it with a word mask; and/or,
从所述样本语料所包含的技术描述中选择所述指定技术的一个相关词语,替换为词语掩码;和/或,Select a relevant word of the specified technology from the technical description contained in the sample corpus, and replace it with a word mask; and/or,
随机选择所述样本语料中的至少一个词语,替换为词语掩码。Randomly select at least one word in the sample corpus and replace it with a word mask.
通过上述措施,可以将样本语料处理为指定样本语料。Through the above measures, the sample corpus can be processed into a specified sample corpus.
在一实施例中,所述技术分类模型通过如下方式训练得到:In one embodiment, the technology classification model is trained as follows:
将样本数据集中样本语料所包括的技术描述,输入至分类模型,获得所述分类模型输出的样本预测类别;Inputting the technical description included in the sample corpus in the sample data set to the classification model, and obtaining the sample prediction category output by the classification model;
根据所述样本语料的样本预测类别与所述样本语料所包含的技术名称之间的差异,调整所述分类模型的模型参数,得到技术分类模型。According to the difference between the sample prediction category of the sample corpus and the technology names contained in the sample corpus, adjust the model parameters of the classification model to obtain a technology classification model.
通过上述措施,可以训练得到技术分类模型。Through the above measures, a technology classification model can be trained.
在一实施例中,所述根据每一段落对应的判断结果,确定所述段落是否包括指定技术,包括:In an embodiment, the determining whether the paragraph includes the specified technology according to the judgment result corresponding to each paragraph includes:
如果任一段落对应的判断结果,指示存在包括预测词语的目标预测类别,确定所述段落包括指定技术;If the judgment result corresponding to any paragraph indicates that there is a target prediction category including the predicted word, it is determined that the paragraph includes the specified technology;
如果任一段落对应的判断结果,指示不存在包括预测词语的目标预测类别,确定所述段落不包括指定技术。If the judgment result corresponding to any paragraph indicates that there is no target prediction category including the predicted word, it is determined that the paragraph does not include the specified technology.
通过上述措施,可以从网络威胁情报中识别出若干包含指定技术的段落。Through the measures described above, several passages containing specified techniques can be identified from cyber threat intelligence.
另一方面,本申请还包括一种威胁情报中指定技术的识别装置,包括:On the other hand, this application also includes a device for identifying technologies specified in threat intelligence, including:
预处理模块,用于对网络威胁情报进行预处理,得到所述网络威胁情报中每一段落对应的词语序列;A preprocessing module, configured to preprocess the cyber threat intelligence to obtain a word sequence corresponding to each paragraph in the cyber threat intelligence;
第一预测模块,用于针对每一段落对应的词语序列,为所述词语序列添加词语掩码后,输入已训练的完形填空模型,获得所述完形填空模型输出的对应于所述词语掩码的预测词语;The first prediction module is used for the word sequence corresponding to each paragraph, after adding a word mask for the word sequence, input the trained cloze model, and obtain the output corresponding to the word mask of the cloze model predicted words of the code;
第二预测模块,用于针对每一段落对应的词语序列,将所述词语序列输入至已训练的技术分类模型,获得所述技术分类模型输出的多种预测类别以及每一预测类别对应的置信度,并选择置信度靠前的若干种预测类别,作为所述段落对应的目标预测类别;其中,每一预测类别指示一种属于指定技术的技术名称;The second prediction module is used to input the word sequence corresponding to each paragraph into the trained technology classification model, and obtain various prediction categories output by the technology classification model and the confidence corresponding to each prediction category , and select several prediction categories with higher confidence levels as the target prediction category corresponding to the paragraph; wherein, each prediction category indicates a technical name belonging to a specified technology;
判断模块,用于针对每一段落,判断是否存在所述段落对应的任一目标预测类别,包括所述段落对应的预测词语;A judging module, for each paragraph, judging whether there is any target prediction category corresponding to the paragraph, including the prediction words corresponding to the paragraph;
确定模块,用于根据每一段落对应的判断结果,确定所述段落是否包括指定技术。The determining module is configured to determine whether the paragraph includes the specified technology according to the judgment result corresponding to each paragraph.
此外,本申请还包括一种电子设备,所述电子设备包括:In addition, the present application also includes an electronic device, which includes:
处理器;processor;
用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
其中,所述处理器被配置为执行上述威胁情报中指定技术的识别方法。Wherein, the processor is configured to execute the identification method of the technology specified in the above threat intelligence.
进一步的,本申请还包括一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序可由处理器执行以完成上述威胁情报中指定技术的识别方法。Further, the present application also includes a computer-readable storage medium, the storage medium stores a computer program, and the computer program can be executed by a processor to complete the identification method of the technology specified in the above threat intelligence.
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例中所需要使用的附图作简单地介绍。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the embodiments of the present application.
图1为本申请一实施例提供的威胁情报中指定技术的识别方法的应用场景示意图;FIG. 1 is a schematic diagram of an application scenario of an identification method for a specified technology in threat intelligence provided by an embodiment of the present application;
图2为本申请一实施例提供的电子设备的结构示意图;FIG. 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;
图3为本申请一实施例提供的威胁情报中指定技术的识别方法的流程示意图;FIG. 3 is a schematic flowchart of a method for identifying a specified technology in threat intelligence provided by an embodiment of the present application;
图4为本申请一实施例提供的威胁情报中指定技术的初步识别方法的流程示意图;FIG. 4 is a schematic flowchart of a preliminary identification method for a specified technology in threat intelligence provided by an embodiment of the present application;
图5为本申请一实施例提供的图3中步骤310的细节流程示意图;FIG. 5 is a schematic flowchart of the details of
图6为本申请一实施例提供的完形填空模型的训练方法的流程示意图;Fig. 6 is a schematic flow chart of a training method for a cloze model provided by an embodiment of the present application;
图7为本申请一实施例提供的技术分类模型的训练方法的流程示意图;FIG. 7 is a schematic flowchart of a training method for a technology classification model provided by an embodiment of the present application;
图8为本申请一实施例提供的威胁情报中指定技术的识别方法的整体示意图;FIG. 8 is an overall schematic diagram of an identification method for a specified technology in threat intelligence provided by an embodiment of the present application;
图9为本申请另一实施例提供的威胁情报中指定技术的识别方法的流程示意图;FIG. 9 is a schematic flowchart of a method for identifying a specified technology in threat intelligence provided by another embodiment of the present application;
图10为本申请一实施例提供的威胁情报中指定技术的识别装置的框图。FIG. 10 is a block diagram of an identification device for a specified technology in threat intelligence provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。同时,在本申请的描述中,术语“第一”、“第二”等仅用于区分描述,而不能理解为指示或暗示相对重要性。Like numbers and letters denote similar items in the following figures, so that once an item is defined in one figure, it does not require further definition and explanation in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second" and the like are only used to distinguish descriptions, and cannot be understood as indicating or implying relative importance.
图1为本申请实施例提供的威胁情报中指定技术的识别方法的应用场景示意图。如图1所示,该应用场景包括客户端20和服务端30;客户端20可以是主机、手机、平板电脑等用户终端,用于向服务端30发送人工构建的样本数据集;服务端30可以是服务器、服务器集群或云计算中心,可以基于样本数据集中的样本语料训练出完形填空模型和技术分类模型,从而借助完形填空模型和技术分类模型从网络威胁情报中识别包含指定技术特征的内容。FIG. 1 is a schematic diagram of an application scenario of a method for identifying a specified technology in threat intelligence provided by an embodiment of the present application. As shown in Figure 1, the application scenario includes a
如图2所示,本实施例提供一种电子设备1,包括:至少一个处理器11和存储器12,图2中以一个处理器11为例。处理器11和存储器12通过总线10连接,存储器12存储有可被处理器11执行的指令,指令被处理器11执行,以使电子设备1可执行下述的实施例中方法的全部或部分流程。在一实施例中,电子设备1可以是上述服务端30,用于执行威胁情报中指定技术的识别方法。As shown in FIG. 2 , this embodiment provides an electronic device 1 , including: at least one
存储器12可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(Static Random Access Memory,简称SRAM),电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,简称EEPROM),可擦除可编程只读存储器(Erasable Programmable Read Only Memory,简称EPROM),可编程只读存储器(Programmable Red-Only Memory,简称PROM),只读存储器(Read-Only Memory,简称ROM),磁存储器,快闪存储器,磁盘或光盘。
本申请还提供了一种计算机可读存储介质,存储介质存储有计算机程序,计算机程序可由处理器11执行以完成本申请提供的威胁情报中指定技术的识别方法。The present application also provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program can be executed by the
参见图3,为本申请一实施例提供的威胁情报中指定技术的识别方法的流程示意图,如图3所示,该方法可以包括以下步骤310-步骤350。Referring to FIG. 3 , it is a schematic flowchart of a method for identifying a specified technology in threat intelligence provided by an embodiment of the present application. As shown in FIG. 3 , the method may include the following steps 310 - 350 .
步骤310:对网络威胁情报进行预处理,得到网络威胁情报中每一段落对应的词语序列。Step 310: Preprocessing the cyber threat intelligence to obtain the word sequence corresponding to each paragraph in the cyber threat intelligence.
本申请方案,用于从网络威胁情报中识别包含指定技术的内容。这里,指定技术为网络威胁情报的使用者较为关注的技术,可以根据使用者需求自行配置。指定技术可以是一种或多种。示例性的,指定技术可以为MITRE ATT&CK(Adversarial Tactics,Techniques,and Common Knowledge)知识库中的技术,或者,指定技术可以为CAPEC(Common Attack Pattern Enumeration and Classification)数据集中的技术。指定技术下可以包括多种细分技术。The solution of this application is used to identify content containing specified technologies from network threat intelligence. Here, the specified technology is the technology that users of network threat intelligence are more concerned about, and can be configured according to user needs. The specified technology can be one or more. Exemplarily, the specified technique may be a technique in the MITER ATT&CK (Adversarial Tactics, Techniques, and Common Knowledge) knowledge base, or the specified technique may be a technique in the CAPEC (Common Attack Pattern Enumeration and Classification) data set. Multiple subdivision techniques can be included under a given technique.
服务端在从互联网或本地存储空间获取需要进行识别的网络威胁情报后,可以对该网络威胁情报进行预处理,从而将网络威胁情报拆分为多个段落,并获得各个段落对应的词语序列。其中,词语序列包括段落内多个词语。After the server obtains the network threat intelligence that needs to be identified from the Internet or local storage space, it can preprocess the network threat intelligence, thereby splitting the network threat intelligence into multiple paragraphs, and obtaining the word sequence corresponding to each paragraph. Wherein, the word sequence includes multiple words in the paragraph.
步骤320:针对每一段落对应的词语序列,为词语序列添加词语掩码后,输入已训练的完形填空模型,获得完形填空模型输出的对应于词语掩码的预测词语。Step 320: After adding a word mask to the word sequence corresponding to each paragraph, input the trained cloze model to obtain the predicted word corresponding to the word mask output by the cloze model.
在获得网络威胁情报中各个段落的词语序列后,服务端可以为每一段落对应的词语序列添加词语掩码,该词语掩码用于指示在词语序列中预测出新词语的位置。示例性的,词语掩码可以添加在词语序列的最前面,也就是将词语序列接在词语掩码后面;或者,词语掩码可以添加在词语序列的最后面,也就是将词语掩码接在词语序列后面。词语掩码的形式可以预先配置,示例性的,词语掩码可以为[MASK]。After obtaining the word sequence of each paragraph in the network threat intelligence, the server can add a word mask to the word sequence corresponding to each paragraph, and the word mask is used to indicate the position where the new word is predicted in the word sequence. Exemplarily, the word mask can be added at the front of the word sequence, that is, the word sequence is connected to the word mask; or, the word mask can be added at the end of the word sequence, that is, the word mask is connected to after the sequence of words. The form of the word mask can be configured in advance, for example, the word mask can be [MASK].
在为词语序列添加词语掩码后,可以将词语序列输入至完形填空模型。其中,完形填空模型可以由自然语言模型训练得到,用于基于文本上下文信息预测新的词汇。自然语言模型可以但不限于是BERT(Bidirectional Encoder Representations fromTransformer)、T5(Text-To-Text Transfer Transformer)、mT5(AMassivelyMultilingual Pre-trained Text-to-Text Transformer)等模型。服务端可以通过完形填空模型,依据词语序列中的其它词语,生成词语掩码所在位置的预测词语。该预测词语是在词语序列存在其它词语的情况下,词语掩码所在位置最可能出现的词语。After adding a word mask to a word sequence, the word sequence can be fed into a cloze model. Among them, the cloze model can be trained by a natural language model and used to predict new vocabulary based on text context information. The natural language model can be, but not limited to, BERT (Bidirectional Encoder Representations from Transformer), T5 (Text-To-Text Transfer Transformer), mT5 (AMassively Multilingual Pre-trained Text-to-Text Transformer) and other models. The server can use the cloze model to generate predicted words at the position of the word mask based on other words in the word sequence. The predicted word is the most likely word at the position of the word mask when other words exist in the word sequence.
针对每一段落,可以分别通过完形填空模型输出该段落的词语序列对应的预测词语。For each paragraph, the predicted word corresponding to the word sequence of the paragraph can be output through the cloze model respectively.
步骤330:针对每一段落对应的词语序列,将词语序列输入至已训练的技术分类模型,获得技术分类模型输出的多种预测类别以及每一预测类别对应的置信度,并选择置信度靠前的若干种预测类别,作为段落对应的目标预测类别;其中,每一预测类别指示一种属于指定技术的技术名称。Step 330: For the word sequence corresponding to each paragraph, input the word sequence into the trained technology classification model, obtain multiple prediction categories output by the technology classification model and the confidence corresponding to each prediction category, and select the one with the highest confidence Several prediction categories are used as the target prediction category corresponding to the paragraph; where each prediction category indicates a technology name belonging to a specified technology.
对于每一段落对应的词语序列,服务端可以将该词语序列输入至技术分类模型。该技术分类模型用于对文本进行分类,可以由分类模型训练得到。其中,分类模型可以但不限于是FastText、SVM(Support Vector Machine,支持向量机)、GBDT(Gradient BoostingDecision Tree,梯度提升决策树)等。For the word sequence corresponding to each paragraph, the server can input the word sequence into the technical classification model. The technical classification model is used to classify text and can be trained by the classification model. Wherein, the classification model may be, but not limited to, FastText, SVM (Support Vector Machine, support vector machine), GBDT (Gradient Boosting Decision Tree, gradient boosting decision tree) and the like.
技术分类模型所能输出的预测类别可以根据需求进行配置。示例性的,指定技术中包括n种细分技术的技术名称,此时,可以训练技术分类模型对n个细分技术进行分类。The forecast categories that the technology classification model can output can be configured according to requirements. Exemplarily, the specified technology includes technology names of n subdivision technologies. At this time, a technology classification model may be trained to classify the n subdivision technologies.
服务端通过技术分类模型对词语序列进行处理,从而输出多个预测类别以及每个预测类别对应的置信度。服务端可以将多个预测类别的置信度按照从大到小的顺序进行排列,从而选择置信度靠前的若干中预测类别,作为段落对应的目标预测类别。这里,服务端所选择的预测类别的数量可以根据需要进行配置,示例性的,服务端可以选择置信度最靠前的两个预测类别作为目标预测类别。The server processes the word sequence through the technical classification model, thereby outputting multiple prediction categories and the confidence corresponding to each prediction category. The server can arrange the confidence levels of multiple prediction categories in descending order, and select several prediction categories with higher confidence levels as the target prediction category corresponding to the paragraph. Here, the number of prediction categories selected by the server can be configured as required. For example, the server can select the two prediction categories with the highest confidence as target prediction categories.
针对每一段落,服务端可以通过技术分类模型,为该段落生成并选择若干目标预测类别,每一目标预测类别指示该段落的内容可能包含的细分技术的技术名称。For each paragraph, the server can generate and select several target prediction categories for the paragraph through the technology classification model, and each target prediction category indicates the technical name of the subdivided technology that the content of the paragraph may contain.
步骤340:针对每一段落,判断是否存在段落对应的任一目标预测类别,包括段落对应的预测词语。Step 340: For each paragraph, determine whether there is any target prediction category corresponding to the paragraph, including the predicted words corresponding to the paragraph.
对于任一段落而言,服务端可以检查该段落的每一目标预测类别,是否包含该段落对应的预测词语,从而确定是否存在至少一个目标预测类别包含该预测词语。For any paragraph, the server can check whether each target prediction category of the paragraph contains the predicted word corresponding to the paragraph, so as to determine whether at least one target prediction category contains the predicted word.
步骤350:根据每一段落对应的判断结果,确定段落是否包括指定技术。Step 350: According to the judgment result corresponding to each paragraph, determine whether the paragraph includes the specified technology.
服务端可以分别根据各个段落对应的判断结果,确定段落是否包括指定技术对应的内容。在网络威胁情报的任一段落存在指定技术的情况下,服务端可以提取出该段落,以便后续使用指定技术相关的内容。The server can determine whether the paragraph includes the content corresponding to the specified technology according to the judgment results corresponding to each paragraph. If there is a specified technology in any paragraph of cyber threat intelligence, the server can extract the paragraph so that the content related to the specified technology can be used later.
通过上述措施,将网络威胁情报拆分出多个段落后,借助完形填空模型和技术分类模型针对各个段落分别进行指定技术的识别,从而准确识别出存在指定技术相关内容的段落。Through the above measures, after the network threat intelligence is split into multiple paragraphs, the specified technology is identified for each paragraph with the help of the cloze model and the technology classification model, so as to accurately identify the paragraphs with content related to the specified technology.
在一实施例中,在通过上述步骤310至步骤350对网络威胁情报中的指定技术进行识别之前,可以先对网络威胁情报进行初步识别。参见图4,为本申请一实施例提供的威胁情报中指定技术的初步识别方法的流程示意图,如图4所示,该方法可以包括如下步骤410至步骤430。In an embodiment, prior to identifying the specified technology in the cyber threat intelligence through the
步骤410:对网络威胁情报,以指定技术下多个技术名称进行正则匹配,判断能否匹配到任一技术名称。Step 410: For the network threat intelligence, perform regular matching with multiple technical names under the specified technology, and determine whether any technical name can be matched.
服务端在获得网络威胁情报之后,可以使用指定技术下全部细分技术的技术名称,对网络威胁情报进行正则匹配,检查网络威胁情报是否匹配到任一技术名称。After obtaining the network threat intelligence, the server can use the technical names of all subdivided technologies under the specified technology to perform regular matching on the network threat intelligence, and check whether the network threat intelligence matches any technical name.
步骤420:如果匹配到任一技术名称,确定网络威胁情报包括指定技术。Step 420: If any technology name is matched, determine that the cyber threat intelligence includes the specified technology.
一种情况下,如果匹配到任一技术名称,说明网络威胁情报包含该技术名称指示的指定技术的相关内容。这种情况下,可以无需对网络威胁情报执行上述步骤310至步骤350的识别流程。In one case, if any technical name is matched, it means that the cyber threat intelligence contains relevant content of the specified technology indicated by the technical name. In this case, it may not be necessary to perform the above identification process of
步骤430:如果无法匹配到任一技术名称,继续执行对网络威胁情报进行预处理的步骤。Step 430: If no technical name can be matched, proceed to the step of preprocessing the network threat intelligence.
另一种情况下,如果无法匹配到指定技术下的任一技术名称,说明网络威胁情报中不直接包含技术名称,此时,需要对网络威胁情报继续执行上述步骤310至步骤350的识别流程,以便从在网络威胁情报中包含技术描述的情况下识别出包含指定技术的相关内容。In another case, if any technical name under the specified technology cannot be matched, it means that the network threat intelligence does not directly contain the technology name. In order to identify relevant content that contains the specified technology from cases where the description of the technology is included in cyber threat intelligence.
通过上述初步识别的流程,可以在网络威胁情报包含指定技术下的技术名称的情况下,快速识别出网络威胁情报中的指定技术,从而降低了识别任务的工作量。Through the above preliminary identification process, the specified technology in the network threat intelligence can be quickly identified when the network threat intelligence contains the technology name under the specified technology, thereby reducing the workload of the identification task.
在一实施例中,参见图5,为本申请一实施例提供的图3中步骤310的细节流程示意图,如图5所示,预处理过程可以包括如下步骤311至步骤313。In an embodiment, refer to FIG. 5 , which is a schematic flow chart showing details of
步骤311:将网络威胁情报划分为若干段落。Step 311: Divide the cyber threat intelligence into several paragraphs.
服务端可以将网络威胁情报划分为若干段落。服务端可以直接将网络威胁情报的各个自然段拆分出来,从而得到多个段落。或者,服务端可以将网络威胁情报的各个自然段拆分出来,并将相邻的自然段进行合并(比如:每两个相邻自然段合并为一个段落),从而得到多个段落。或者,服务端可以选择连续的多个句子,作为一个段落,从而划分出多个段落。示例性的,将连续的10个句子划分为一个段落。The server can divide the cyber threat intelligence into several paragraphs. The server can directly split each natural segment of the network threat intelligence to obtain multiple segments. Alternatively, the server may split each natural segment of the network threat intelligence and merge adjacent natural segments (for example: every two adjacent natural segments are merged into one paragraph), thereby obtaining multiple paragraphs. Alternatively, the server may select multiple consecutive sentences as a paragraph, thereby dividing multiple paragraphs. Exemplarily, 10 consecutive sentences are divided into a paragraph.
步骤312:针对每一段落进行分词,并从分词结果中滤除停用词和无效词。Step 312: perform word segmentation for each paragraph, and filter out stop words and invalid words from word segmentation results.
针对每一段落,服务端可以对该段落进行分词处理,从而得到多个分词结果,每一分词结果为一个词语。服务端可以借助停用词词表和无效词词表,从多个分词结果中滤除停用词和无效词,从而得到该段落经过滤除处理的分词结果。For each paragraph, the server can perform word segmentation processing on the paragraph to obtain multiple word segmentation results, and each word segmentation result is a word. The server can filter out stop words and invalid words from multiple word segmentation results with the help of the stop word vocabulary and invalid word vocabulary, so as to obtain the word segmentation result of the paragraph after filtering.
步骤313:针对每一段落,对经过滤除处理的分词结果,进行词干提取,得到段落对应的词语序列。Step 313: For each paragraph, perform word stem extraction on the word segmentation result after filtering to obtain the word sequence corresponding to the paragraph.
对于任一段落经过滤除处理的分词结果,服务端可以检查其中是否存在可提取词干的词语,若存在,则可以对该词语去除词尾,提取词干。示例性的,英文文本中常见的词尾有“ing”、“s”等。对于分词结果中没有词尾的词语,则无需处理。对包含词尾的词语提取词干后,可以将提取到的词干和其它没有词尾的词语,构成段落对应的词语序列。For the word segmentation results of any paragraph that has been filtered out, the server can check whether there is a word that can be extracted from the word, and if it exists, it can remove the end of the word and extract the stem. Exemplarily, common word endings in English texts include "ing", "s" and so on. No processing is required for words without endings in word segmentation results. After extracting the word stems from the words containing the endings, the extracted word stems and other words without endings can be used to form a word sequence corresponding to the paragraph.
通过上述措施,可以将网络威胁情报处理为若干段落对应的词语序列。Through the above measures, the network threat intelligence can be processed into word sequences corresponding to several paragraphs.
在一实施例中,参见图6,为本申请一实施例提供的完形填空模型的训练方法的流程示意图,如图6所示,该方法可以包括如下步骤610至步骤630。In an embodiment, refer to FIG. 6 , which is a schematic flowchart of a method for training a cloze model provided in an embodiment of the present application. As shown in FIG. 6 , the method may include
步骤610:针对样本数据集中的样本语料,以词语掩码替换样本语料中的至少一个词语,得到指定样本语料。Step 610: For the sample corpus in the sample data set, replace at least one word in the sample corpus with a word mask to obtain a specified sample corpus.
样本数据集中可以包括多个样本语料,每一样本语料包括一个指定技术下的细分技术的技术名称,以及该细分技术的技术描述。The sample data set may include multiple sample corpora, each sample corpus includes the technical name of a subdivided technology under a specified technology, and the technical description of the subdivided technology.
针对任一样本语料,服务端可以从样本语料中选择至少一个词语,并以一个词语掩码替换该词语在样本语料中的位置,从而得到指定样本语料。示例性的,样本语料包括10个词语,选择第2个词语以词语掩码进行替换,从而得到9个词语加上1个词语掩码的指定样本语料。For any sample corpus, the server can select at least one word from the sample corpus, and replace the position of the word in the sample corpus with a word mask, so as to obtain the specified sample corpus. Exemplarily, the sample corpus includes 10 words, and the second word is selected to be replaced with a word mask, so as to obtain a specified sample corpus of 9 words plus 1 word mask.
在一实施例中,在以词语掩码替换样本语料中的至少一个词语时,服务端可以通过以下几种方式中一种或多种的组合完成替换。In an embodiment, when replacing at least one word in the sample corpus with a word mask, the server may complete the replacement in one or more of the following ways.
第一种替换方式:服务端可以从样本语料所包含的技术名称中选择一个词语,替换为词语掩码。由于技术名称通常由多个词语构成,因此,对于一条样本语料,选择技术名称中不同的词语替换为词语掩码后,可以得到多条指定样本语料。The first replacement method: the server can select a word from the technical names contained in the sample corpus and replace it with a word mask. Since technical names usually consist of multiple words, for a sample corpus, after selecting different words in the technical name and replacing them with word masks, multiple specified sample corpora can be obtained.
第二种替换方式:服务端可以从样本语料所包含的技术描述中选择指定技术的一个相关词语,替换为词语掩码。这里,相关词语可以是与指定技术具有关联性的词语,相关词语可以由人工进行预配置。示例性的,相关词语可以是protocol、command。服务端可以根据预配置的相关词语,在样本语料所包含的技术描述中进行查找,从而将查找到的技术描述中的任一相关词语替换为词语掩码。由于技术描述可能包括多个相关词语,因此,对于一条样本语料,选择技术描述中不同的词语替换为词语掩码后,可以得到多条指定样本语料。The second replacement method: the server can select a related word of the specified technology from the technical description contained in the sample corpus and replace it with a word mask. Here, the related words may be words related to the specified technology, and the related words may be preconfigured manually. Exemplarily, the related words may be protocol and command. The server can search the technical description contained in the sample corpus according to the pre-configured related words, so as to replace any related word in the found technical description with a word mask. Since the technical description may include multiple related words, for a sample corpus, after selecting different words in the technical description and replacing them with word masks, multiple specified sample corpus can be obtained.
第三种替换方式:服务端可以随机选择样本语料中的至少一个词语,替换为词语掩码。这里,随机选中的每一词语均被替换为一个词语掩码。The third replacement method: the server can randomly select at least one word in the sample corpus and replace it with a word mask. Here, each randomly selected word is replaced with a word mask.
步骤620:将指定样本语料输入预训练模型,获得指定样本语料中词语掩码对应的样本预测结果。Step 620: Input the specified sample corpus into the pre-training model, and obtain the sample prediction result corresponding to the word mask in the specified sample corpus.
服务端可以将指定样本语料输入至预训练模型,这里,预训练模型可以是BERT、T5、mT5等自然语言模型训练得到的模型。服务端可以通过预训练模型,对指定样本语料中词语掩码所在位置最可能出现的词语进行预测,获得样本预测结果。样本预测结果可以包括多个样本预测词语,且每一样本预测词语对应一个匹配度。该匹配度在0到1之间。The server can input the specified sample corpus into the pre-training model. Here, the pre-training model can be a model trained by natural language models such as BERT, T5, and mT5. The server can use the pre-trained model to predict the words most likely to appear at the position of the word mask in the specified sample corpus, and obtain the sample prediction results. The sample prediction result may include multiple sample prediction words, and each sample prediction word corresponds to a matching degree. The matching degree is between 0 and 1.
步骤630:根据指定样本语料中词语掩码对应的样本预测结果和被替换词语,对预训练模型的模型参数进行调整,得到完形填空模型。Step 630: Adjust the model parameters of the pre-trained model according to the sample prediction results corresponding to the word masks in the specified sample corpus and the replaced words to obtain a cloze model.
针对每一指定样本语料中词语掩码对应的样本预测结果,服务端可以在样本预测结果中查找该词语掩码所在位置被替换词语,从而得到被替换词语在样本预测结果中的匹配度。在查到被替换词语的匹配度之后,服务端可以根据损失函数评估被替换词语的匹配度与被替换词语的目标匹配度之间的差异,从而调整预训练模型的模型参数。这里,目标匹配度为1。For the sample prediction result corresponding to the word mask in each specified sample corpus, the server can search the sample prediction result for the replaced word at the position of the word mask, so as to obtain the matching degree of the replaced word in the sample prediction result. After finding the matching degree of the replaced word, the server can evaluate the difference between the matching degree of the replaced word and the target matching degree of the replaced word according to the loss function, thereby adjusting the model parameters of the pre-trained model. Here, the target matching degree is 1.
在调整模型参数之后,可以返回步骤620,重新将指定样本语料输入至经过调整的预训练模型。经过多轮迭代训练之后,可以得到经过训练的完形填空模型。After adjusting the model parameters, it is possible to return to step 620 and re-input the specified sample corpus into the adjusted pre-training model. After multiple rounds of iterative training, a trained cloze model can be obtained.
通过上述措施,可以训练得到能够对文本中词语掩码所在位置输出预测词汇的完形填空模型。Through the above measures, a cloze model capable of outputting predicted vocabulary for the position of the word mask in the text can be trained.
在一实施例中,参见图7,为本申请一实施例提供的技术分类模型的训练方法的流程示意图,如图7所示,该方法可以包括如下步骤710至步骤720。In an embodiment, refer to FIG. 7 , which is a schematic flowchart of a method for training a technology classification model provided in an embodiment of the present application. As shown in FIG. 7 , the method may include steps 710 to 720 as follows.
步骤710:将样本数据集中样本语料所包括的技术描述,输入至分类模型,获得分类模型输出的样本预测类别。Step 710: Input the technical description included in the sample corpus in the sample data set to the classification model, and obtain the predicted category of the sample output by the classification model.
这里,分类模型可以是FastText、SVM、GBDT等模型中的一种。Here, the classification model may be one of models such as FastText, SVM, and GBDT.
服务端可以将样本语料中的技术描述,输入至分类模型,从而得到分类模型输出的样本预测类别。对于能够直接处理自然语言的分类模型,可以将技术描述直接输入分类模型;对于无法直接处理自然语言的分类模型,可以将技术描述通过词向量转换的手段转换为技术描述对应的多维向量,进而将多维向量输入至分类模型。The server can input the technical description in the sample corpus into the classification model, so as to obtain the sample prediction category output by the classification model. For a classification model that can directly process natural language, the technical description can be directly input into the classification model; for a classification model that cannot directly process natural language, the technical description can be converted into a multidimensional vector corresponding to the technical description by means of word vector conversion, and then the A multidimensional vector input to a classification model.
步骤720:根据样本语料的样本预测类别与样本语料所包含的技术名称之间的差异,调整分类模型的模型参数,得到技术分类模型。Step 720: According to the difference between the sample predicted categories of the sample corpus and the technology names contained in the sample corpus, adjust the model parameters of the classification model to obtain a technology classification model.
服务端可以通过损失函数评估样本语料的样本预测类别,与样本语料自身的技术名称之间的差异,从而对分类模型的模型参数进行调整。经过调整之后,可以返回步骤710,重新将样本语料中的技术描述输入分类模型,以进一步调整分类模型的模型参数。经过多轮迭代训练之后,可以得到经过训练的技术分类模型。The server can evaluate the difference between the sample prediction category of the sample corpus and the technical name of the sample corpus itself through the loss function, so as to adjust the model parameters of the classification model. After the adjustment, return to step 710, and re-input the technical description in the sample corpus into the classification model, so as to further adjust the model parameters of the classification model. After multiple rounds of iterative training, a trained technology classification model can be obtained.
通过上述措施,可以训练得到用于基于语料进行技术分类的技术分类模型。Through the above measures, a technology classification model for technology classification based on corpus can be trained.
在一实施例中,在训练完形填空模型或技术分类模型之前,可以构建样本数据集。服务端可以响应于用户操作,从网络威胁情报中提取指定技术的相关内容,并以相关内容根据技术名称加技术描述的形式构建为一条语料。对语料中的停用词和无效词进行滤除后,从经过滤除处理的词语中进行词干提取,从而以提取出的词干和其它没有词尾的词语,构建出样本语料。服务端可以根据多条样本语料,构建样本语料库。In an embodiment, before training the cloze model or technology classification model, a sample data set may be constructed. In response to user operations, the server can extract the relevant content of the specified technology from the network threat intelligence, and construct a piece of corpus in the form of the technical name and technical description in the form of the relevant content. After filtering out stop words and invalid words in the corpus, stem extraction is performed from the filtered words, so as to construct a sample corpus with the extracted word stems and other words without endings. The server can construct a sample corpus based on multiple pieces of sample corpus.
参见图8,为本申请一实施例提供的威胁情报中指定技术的识别方法的整体示意图,如图8所示,首先可以从服务器上获取大量网络威胁情报,并由人工操作从中提取出指定技术相关的内容,图8中指定技术为ATT&CK技术。基于提取出的内容构建出样本语料,以多条样本语料构建出样本把数据库,并以此训练完形填空模型和技术分类模型。在训练出两个模型后,后续以完形填空模型和技术分类模型,从待测威胁情报中抽取与ATT&CK技术的相关内容。Referring to FIG. 8 , it is an overall schematic diagram of a method for identifying specified technologies in threat intelligence provided by an embodiment of the present application. As shown in FIG. 8 , firstly, a large amount of network threat intelligence can be obtained from the server, and the specified technologies can be extracted manually. For related content, the designated technology in Figure 8 is the ATT&CK technology. Construct a sample corpus based on the extracted content, construct a sample database with multiple sample corpora, and use it to train the cloze model and technology classification model. After the two models are trained, the cloze model and technology classification model are used to extract the content related to ATT&CK technology from the threat intelligence to be tested.
在一实施例中,服务端在根据每一段落对应的判断结果,确定段落是否包括指定技术时,一种情况下,如果任一段落对应的判断结果,指示存在预测词语的目标预测类别,确定段落包括指定技术。当目标预测类别包括预测词语时,可以确定该段落包含该目标预测类别指示的细分技术的相关内容。如果存在至少两个目标预测类别,且至少两个目标预测类别均包含预测词语,则确定该段落包括置信度最大的目标预测类别指示的细分技术的相关内容。In one embodiment, when the server determines whether the paragraph includes the specified technology according to the judgment result corresponding to each paragraph, in one case, if the judgment result corresponding to any paragraph indicates that there is a target prediction category of the predicted word, it is determined that the paragraph includes Specify technology. When the target prediction category includes prediction words, it may be determined that the paragraph contains relevant content of the segmentation technology indicated by the target prediction category. If there are at least two target prediction categories, and both of the at least two target prediction categories contain predicted words, then it is determined that the paragraph includes relevant content of the subdivision technology indicated by the target prediction category with the highest confidence.
另一种情况下,如果任一段落对应的判断结果,指示不存在包括预测词语的目标预测类别,确定段落不包括指定技术。In another case, if the judgment result corresponding to any paragraph indicates that there is no target prediction category including the predicted word, it is determined that the paragraph does not include the specified technology.
参见图9,为本申请另一实施例提供的威胁情报中指定技术的识别方法的流程示意图,如图9所示,从网络威胁情报划分出多个段落,对于段落1,通过完形填空模型生成预测词语W1、通过技术分类模型生成目标预测类别R1和R2,R1对应的置信度为S1,R2对应的置信度为S2。Referring to FIG. 9 , it is a schematic flow diagram of the identification method of the specified technology in threat intelligence provided by another embodiment of the present application. As shown in FIG. 9 , multiple paragraphs are divided from the network threat intelligence. For paragraph 1, the cloze model is used Generate the predicted word W1, and generate the target prediction categories R1 and R2 through the technical classification model. The confidence level corresponding to R1 is S1, and the confidence level corresponding to R2 is S2.
服务端可以判断W1是否存在于R1或R2。由于R1和R2为指示目标预测类别的技术名称,在段落1包含指定技术的情况下,通过完形填空模型可以依据段落1生成技术名称中的词语。而技术分类模型为段落1分类得到的目标预测类别必然会包含完形填空模型所预测的词语。The server can determine whether W1 exists in R1 or R2. Since R1 and R2 are technical names indicating the target prediction category, in the case that paragraph 1 contains the specified technology, the words in the technical name can be generated according to paragraph 1 through the cloze model. However, the target prediction category obtained by the technical classification model for paragraph 1 will inevitably contain the words predicted by the cloze model.
一种情况下,目标预测类别R1和R2均不包含预测词语W1,说明段落1没有描述ATT&CK技术的相关内容。In one case, neither the target prediction category R1 nor R2 contains the prediction word W1, indicating that paragraph 1 does not describe the relevant content of the ATT&CK technology.
一种情况下,目标预测类别R1包含预测词语W1,目标预测类别R2不包含预测词语W1,说明段落1描述的是ATT&CK下的细分技术R1。In one case, the target prediction category R1 contains the prediction word W1, and the target prediction category R2 does not contain the prediction word W1. Explanatory paragraph 1 describes the subdivision technology R1 under ATT&CK.
一种情况下,目标预测类别R2包含预测词语W1,目标预测类别R1不包含预测词语W1,说明段落1描述的是ATT&CK下的细分技术R2。In one case, the target prediction category R2 contains the predicted word W1, and the target prediction category R1 does not contain the predicted word W1. Explanatory paragraph 1 describes the subdivision technique R2 under ATT&CK.
一种情况下,目标预测类别R1和R2均包含预测词语W1,则其中置信度较高的目标预测类别为段落1描述的ATT&CK下的细分技术。In one case, the target prediction categories R1 and R2 both contain the predicted word W1, and the target prediction category with higher confidence is the subdivision technology under ATT&CK described in paragraph 1.
通过上述措施,可以从网络威胁情报中识别出若干包括ATT&CK相关内容的段落。Through the above measures, several passages including ATT&CK related content can be identified from the cyber threat intelligence.
图10是本发明一实施例的一种威胁情报中指定技术的识别装置的框图,如图10所示,该装置可以包括:Fig. 10 is a block diagram of an identification device for a technology specified in threat intelligence according to an embodiment of the present invention. As shown in Fig. 10, the device may include:
预处理模块1010,用于对网络威胁情报进行预处理,得到所述网络威胁情报中每一段落对应的词语序列;A
第一预测模块1020,用于针对每一段落对应的词语序列,为所述词语序列添加词语掩码后,输入已训练的完形填空模型,获得所述完形填空模型输出的对应于所述词语掩码的预测词语;The
第二预测模块1030,用于针对每一段落对应的词语序列,将所述词语序列输入至已训练的技术分类模型,获得所述技术分类模型输出的多种预测类别以及每一预测类别对应的置信度,并选择置信度靠前的若干种预测类别,作为所述段落对应的目标预测类别;其中,每一预测类别指示一种属于指定技术的技术名称;The
判断模块1040,用于针对每一段落,判断是否存在所述段落对应的任一目标预测类别,包括所述段落对应的预测词语;
确定模块1050,用于根据每一段落对应的判断结果,确定所述段落是否包括指定技术。The determining
上述装置中各个模块的功能和作用的实现过程具体详见上述威胁情报中指定技术的识别方法中对应步骤的实现过程,在此不再赘述。The implementation process of the functions and functions of each module in the above-mentioned device is detailed in the implementation process of the corresponding steps in the identification method of the specified technology in the above-mentioned threat intelligence, and will not be repeated here.
在本申请所提供的几个实施例中,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图中的流程图和框图显示了根据本申请的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。In the several embodiments provided in this application, the disclosed devices and methods may also be implemented in other ways. The device embodiments described above are only illustrative. For example, the flowcharts and block diagrams in the accompanying drawings show the architecture, functions and possible implementations of devices, methods and computer program products according to multiple embodiments of the present application. operate. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more executable instruction. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.
另外,在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。In addition, each functional module in each embodiment of the present application may be integrated to form an independent part, each module may exist independently, or two or more modules may be integrated to form an independent part.
功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes. .
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211387653.0A CN115687979B (en) | 2022-11-07 | 2022-11-07 | Identification methods and devices, electronic devices, and storage media for specified technologies in threat intelligence |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211387653.0A CN115687979B (en) | 2022-11-07 | 2022-11-07 | Identification methods and devices, electronic devices, and storage media for specified technologies in threat intelligence |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN115687979A true CN115687979A (en) | 2023-02-03 |
| CN115687979B CN115687979B (en) | 2025-09-23 |
Family
ID=85049843
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211387653.0A Active CN115687979B (en) | 2022-11-07 | 2022-11-07 | Identification methods and devices, electronic devices, and storage media for specified technologies in threat intelligence |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115687979B (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111832292A (en) * | 2020-06-03 | 2020-10-27 | 北京百度网讯科技有限公司 | Text recognition processing method, device, electronic device and storage medium |
| CN113420127A (en) * | 2021-07-06 | 2021-09-21 | 北京信安天途科技有限公司 | Threat information processing method, device, computing equipment and storage medium |
| WO2021208703A1 (en) * | 2020-11-19 | 2021-10-21 | 平安科技(深圳)有限公司 | Method and apparatus for question parsing, electronic device, and storage medium |
| CN114416984A (en) * | 2022-01-12 | 2022-04-29 | 平安科技(深圳)有限公司 | Artificial intelligence-based text classification method, device, equipment and storage medium |
-
2022
- 2022-11-07 CN CN202211387653.0A patent/CN115687979B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111832292A (en) * | 2020-06-03 | 2020-10-27 | 北京百度网讯科技有限公司 | Text recognition processing method, device, electronic device and storage medium |
| WO2021208703A1 (en) * | 2020-11-19 | 2021-10-21 | 平安科技(深圳)有限公司 | Method and apparatus for question parsing, electronic device, and storage medium |
| CN113420127A (en) * | 2021-07-06 | 2021-09-21 | 北京信安天途科技有限公司 | Threat information processing method, device, computing equipment and storage medium |
| CN114416984A (en) * | 2022-01-12 | 2022-04-29 | 平安科技(深圳)有限公司 | Artificial intelligence-based text classification method, device, equipment and storage medium |
Non-Patent Citations (1)
| Title |
|---|
| 王沁心等: "基于STIX标准的威胁情报实体抽取研究", 《网络空间安全》, 31 August 2020 (2020-08-31), pages 1 - 6 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115687979B (en) | 2025-09-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11762926B2 (en) | Recommending web API's and associated endpoints | |
| CN110929125B (en) | Search recall method, device, equipment and storage medium thereof | |
| US10163063B2 (en) | Automatically mining patterns for rule based data standardization systems | |
| JP2021111367A (en) | Automatic parameter-value solution for api evaluation | |
| KR102067926B1 (en) | Apparatus and method for de-identifying personal information contained in electronic documents | |
| CN110321437B (en) | Corpus data processing method and device, electronic equipment and medium | |
| CN111984792A (en) | Website classification method and device, computer equipment and storage medium | |
| CN111767716A (en) | Method and device for determining enterprise multilevel industry information and computer equipment | |
| JP7254925B2 (en) | Transliteration of data records for improved data matching | |
| CN110750984B (en) | Command line string processing method, terminal, device and readable storage medium | |
| CN114265919B (en) | Entity extraction method, device, electronic device and storage medium | |
| CN110347806A (en) | Original text discriminating method, device, equipment and computer readable storage medium | |
| CN112651236A (en) | Method and device for extracting text information, computer equipment and storage medium | |
| CN115964997A (en) | Confusion option generation method and device for choice questions, electronic equipment and storage medium | |
| CN116796730A (en) | Text error correction method, device, equipment and storage medium based on artificial intelligence | |
| CN113688240B (en) | Threat element extraction method, threat element extraction device, threat element extraction equipment and storage medium | |
| KR102166102B1 (en) | Device and storage medium for protecting privacy information | |
| US11893048B1 (en) | Automated indexing and extraction of multiple information fields in digital records | |
| CN113326363A (en) | Searching method and device, prediction model training method and device, and electronic device | |
| CN117332039A (en) | Text detection method, device, equipment and storage medium | |
| Sagcan et al. | Toponym recognition in social media for estimating the location of events | |
| CN115687979B (en) | Identification methods and devices, electronic devices, and storage media for specified technologies in threat intelligence | |
| CN111782601A (en) | Electronic file processing method and device, electronic equipment and machine readable medium | |
| JP2022533948A (en) | Communication server device, communication device, and method of operation thereof | |
| JP5824429B2 (en) | Spam account score calculation apparatus, spam account score calculation method, and program |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |