CN120185930B

CN120185930B - Threat information-based network threat rule generation method

Info

Publication number: CN120185930B
Application number: CN202510648405.4A
Authority: CN
Inventors: 汉京宁
Original assignee: Jiangsu Ruining Xinchuang Technology Co ltd
Current assignee: Jiangsu Ruining Xinchuang Technology Co ltd
Priority date: 2025-05-20
Filing date: 2025-05-20
Publication date: 2025-08-05
Anticipated expiration: 2045-05-20
Also published as: CN120185930A

Abstract

The present invention relates to a method for generating network threat rules based on threat intelligence, and relates to the field of network security. The present invention crawls open source network threat intelligence; uses image analysis prompt words to guide a multimodal language model to convert image-based open source network threat intelligence into text-based information; unifies the localized content format to obtain initial network threat intelligence; uses a language model to assist in filtering the initial network threat intelligence; an intelligent agent uses voting to identify first-category entities and second-category entities from the filtered network threat intelligence and establishes connections; uses sigma rules to create prompt words to control the intelligent agent to create sigma rules based on the associated first-category entities and second-category entities extracted from the filtered network threat intelligence block; uses sigma rules to optimize prompt words and verify prompt words; and optimizes and verifies the generated sigma rules using the language model used by the intelligent agent.

Description

Threat information-based network threat rule generation method

Technical Field

The invention relates to the technical field of network threat rule construction, in particular to a method for generating a network threat rule based on threat information.

Background

The sigma rule is in a general signature format, is a security event detection rule set and is used for a log analysis tool to help a security analyst to analyze and identify abnormal behaviors of a network, the sigma rule comprises a customized rule set customized for different attack modes, each sigma rule in the rule set is triggered to alarm by a condition, and the structure of each sigma rule can be divided into three main parts, namely a head part, an option part and a detection query part, wherein the head part comprises basic information of the sigma rule, such as id, title, description, author, date and the like of the rule. This information is critical to understanding the context, purpose, and source of the rule. The options section defines the context requirements of the rule, such as process creation time, use of a specific process ID, etc. The detection query part is the core of rule definition and describes specific conditions to be detected, which are usually fields in log sources and their values, and the fields and values in various log sources are used for accurate matching. The log source may be from an operating system, an application, or any other log generating source. These fields, in combination with the corresponding values, define the conditions for detection of the security threat. Sigma rules provide a series of conditional combinations and logical operators to create richer rule expressions. The conditional combination is to use logical operators and, or, not to connect different rule selection parts to build complex detection scenarios. In addition, rule sets may build hierarchies and dependencies, allowing related rules to be organized together to form a hierarchical structure. Such a structure helps manage a large number of rules and can improve the readability and maintainability of the rules, and the relationships between rule sets can be inclusion and dependency.

Open source cyber-threat intelligence is an important source for forming cyber-threat sigma rules, however, open source cyber-threat intelligence typically occurs in unstructured format and contains image information that requires further human analysis to form formatted sigma rules. Due to unstructured and image forms of the open source network threat information, the conventional automatic analysis of the open source network threat information through a regularization means to form sigma rules is limited. With the development of natural language technology, natural language processing technology is used for open source network threat intelligence analysis, and an advanced natural language model is utilized to extract the effective content of sigma rule from open source network threat intelligence text. However, in order to adapt these models to the field of cyber-threats, to the semantics of the field of cyber-threat intelligence, a great deal of pre-processing and fine-tuning is required.

Disclosure of Invention

In order to solve the technical problems or at least partially solve the technical problems, the invention provides a network threat rule generation method based on threat intelligence.

In a first aspect, the present invention provides a method for generating a network threat rule based on threat intelligence, including:

crawling an open source network threat information webpage from an open source network threat information source through a network crawling tool;

Guiding the multimodal language model to convert the crawled open source network threat information of the image class in the webpage element related to the open source network threat information into a text class through the image analysis prompting word;

converting the text content in the webpage element related to the open source network threat information into a unified text format to obtain initial network threat information;

analyzing keywords representing redundant contents in the initial network threat information titles through a language model, and removing repeated redundant contents in the initial network threat information according to the positions of text structure layers divided by all titles of target titles for the target titles with the keywords representing the redundant contents to obtain filtered network threat information;

providing the filtered network threat information to at least one intelligent agent based on a language model, wherein the intelligent agent utilizes the semantic analysis capability of the language model and the mode of multi-intelligent agent voting to identify a first type entity and a second type entity from the filtered network threat information and establish a connection, the first type entity is an entity necessary for forming a sigma rule detection query part, the first type entity comprises an API or a process call, request parameters of the API or the process call, an intrusion index, a log source and an event source, the second type entity provides the context information of the network threat information, and the second type entity comprises a title and a description in a sigma rule, a threat technique, a false report and a threat level;

the method comprises the steps that a sigma rule is used for creating a prompt word to control a language model used by an agent, and sigma rule creation is carried out on the first-class entity and the second-class entity which are extracted from a network threat information block based on the filtered network threat information block;

The generated sigma rule is optimized by utilizing a sigma rule optimizing prompt word to control a language model used by the intelligent agent;

and verifying the generated and optimized sigma rule by using a language model used by the sigma rule verification prompt word control agent.

Furthermore, the image analysis prompt word defines a multi-mode language model as a network security analysis expert and is specially used for carrying out text analysis on images from threat information sources, the task of defining the multi-mode language model is to provide text analysis on the images, and the task requirement is defined to include the steps of giving image description in a text form, presenting the image description in a format closely matched with the appearance of the images, ensuring that the output description is related to and accurately reflected on the image content, ensuring that the output description is complete and not adding any data related potential use or explanation, suggestion or opinion.

Further, the text content in the webpage element related to the open-source network threat information is converted into a unified text format, and in the process of obtaining the initial network threat information, the unified text format keeps the space, paragraph and code segmentation of the open-source network threat information content in the webpage element so as to keep the original layout of the text content, the title of the webpage element is marked by the unified text format so as to construct a corresponding hierarchical structure according to the content in the original HTML page by using the marked title, and the structural attribute of the HTML code containing the form and the nested list is kept by the unified text format.

Further, the providing the filtered cyber-threat intelligence to at least one language model-based agent, the agent identifying a first type of entity and a second type of entity from the filtered cyber-threat intelligence by utilizing semantic analysis capabilities of the language model and a manner of voting by multiple agents, and establishing a connection includes:

Dividing the filtered network threat information into semantically complete blocks through a text dividing tool, wherein the content in each filtered network threat information block is semantically uniform;

the first type entity extracts prompt words to guide an intelligent agent to use a language model to extract corresponding first type entities from the filtered network threat information block, the intelligent agent performs voting according to the first type entity result extracted by the intelligent agent, and if the voting number of any first type entity exceeds a set voting number threshold, the filtered network threat information block contains the first type entity;

The method includes the steps that a context block is built by taking filtered network threat information blocks from which first-class entities are extracted as the center and combining a preset number of filtered network threat information blocks before and after a text format;

And extracting the prompt words from the second type entities, guiding the language model used by the intelligent agent to refer to the extracted first type entities, extracting the corresponding second type entities from the context block, and constructing the connection between the second type entities and the corresponding first type entities.

Further, for API or process call in the first kind of entity, the language model used by the agent is guided to extract the explicitly mentioned API call or process call from the filtered network threat information block by the API or process call extraction prompt word, the agent votes according to the extracted API or process call extraction result, if the voting number of any API or process call exceeds the set vote number threshold, the filtered network threat information block contains the API or process call entity, wherein the content of the API or process call extraction prompt word comprises that the language model used by the agent is controlled to extract the API or process call entity in a word matching mode by using the word related to the API or process call, and the language model used by the agent is further guided to extract the API or process call entity in a word matching mode by using the API or process call semantic correlation analysis mode.

Furthermore, in order to realize that the language model used by the intelligent agent is guided to extract the API or the process calling entity according to the manner of calling semantic relativity analysis by the API or the process, a corresponding knowledge base is constructed for the intelligent agent, wherein the knowledge base is a vectorized database for supporting the recognition of the language model used by the intelligent agent, the knowledge base comprises semantic descriptions of network operation and API or process calling corresponding to the semantic descriptions of the network operation, the language model used by the intelligent agent is provided with a filtered network threat information block, the filtered network threat information block is used as query for vectorization, the language model used by the intelligent agent matches topk results which are most similar to the vectorized query from the knowledge base, and candidate API or process calling to be voted is predicted according to topk results.

Furthermore, extraction of threat technique in the second type of entity requires configuring a prompt dictionary in the agent, wherein the prompt dictionary contains a mapping between API or process call and threat technique in the first type of entity, and a mapping between threat technique and API or process call, and the agent uses a language model to extract threat technique labels corresponding to the API or process call according to the content of the context block and the extracted semantics, and referring to the prompt dictionary.

Further, the content of the sigma rule creating prompt word comprises defining roles which are used as network security analysis specialists and are specially used for generating sigma rules from open source network threat information contexts, defining tasks which are used for converting the network threat information contexts containing the following elements into sigma rules, namely analysis contents describing attack technique and tactics in the open source network threat information, events in the open source network threat information of log sources or event source groups, namely a first type entity and a second type entity related to the events, defining tasks which require that all provided event names, IDs, event sources and corresponding technical tactics of the events must be used, omitting any key information is forbidden, each event name only appears in one sigma rule, extracting details which can be converted into parameters in the network threat information contexts, combining the sigma rules with the same technical tactics, and strictly formatting the generated sigma rules.

Furthermore, in the process of optimizing the generated sigma rule by using the sigma rule optimizing prompt words to control the language model used by the intelligent agent, the content of the sigma rule optimizing prompt words comprises the steps of merging detection query fields with the same detection query standard and consistent condition logic in the sigma rule, and dividing the detection query fields with the same detection query standard and inconsistent condition logic.

Further, in the process of verifying the generated and optimized sigma rule by using the sigma rule verification prompt word to control the language model used by the intelligent agent, the content of the used sigma rule verification prompt word comprises verifying whether the format of the generated and optimized sigma rule meets the requirement of the sigma rule format by referring to the provided sigma rule format, verifying whether metadata in the generated and optimized sigma rule meets the original content of the open source network threat information by referring to the provided open source network threat information, and verifying the accuracy of the conditions in the sigma rule by referring to the provided open source network threat information.

In a second aspect, the invention provides a threat information-based network threat rule generation apparatus, which comprises at least one processing unit, wherein the processing unit is connected with a storage unit through a bus unit, the storage unit stores a computer program, and the threat information-based network threat rule generation method is realized when the computer program is executed by the processing unit.

In a third aspect, the present invention provides a computer readable storage medium storing a computer program, which when executed by a processor, implements the threat intelligence based network threat rule generation method.

Compared with the prior art, the technical scheme provided by the embodiment of the invention has the following advantages:

The method comprises the steps of crawling open-source network threat information, guiding a multi-mode language model through image analysis prompt words to convert the crawled open-source network threat information of image types into text types, converting text contents into a unified text format to obtain initial network threat information, assisting in filtering the initial network threat information through the language model, providing the filtered network threat information for at least one intelligent body based on the language model, identifying first-class entities and second-class entities from the filtered network threat information by the intelligent body in a voting mode, establishing connection, controlling the intelligent body to establish sigma rules based on the filtered network threat information blocks by utilizing sigma rule creation prompt words, and extracting the associated first-class entities and second-class entities from the network threat information blocks. The application utilizes the multi-modal language model and the intelligent agent constructed by the language model to automatically generate sigma rules according to the multi-modal network threat information. 92% accuracy and 96% recall are achieved on the critical API or process call extraction tasks in the first class of entities, and 98% accuracy and 97% recall are achieved on the critical intrusion index extraction tasks in the first class of entities. In addition, 98.28% of the generated sigma rule candidates were successfully compiled.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a flowchart of a method for generating a network threat rule based on threat intelligence provided by an embodiment of the invention;

FIG. 2 is a schematic diagram of a method for generating a network threat rule based on threat information according to an embodiment of the present invention;

FIG. 3 is a flow chart for providing filtered cyber-threat intelligence to at least one language-model-based agent, the agent identifying a first type of entity and a second type of entity from the filtered cyber-threat intelligence and establishing a relationship using semantic analysis capabilities of the language model and a manner of multi-agent voting, provided by an embodiment of the invention;

fig. 4 is a schematic diagram of a network threat rule generating apparatus based on threat information according to an embodiment of the invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

Example 1

As shown in fig. 1 and fig. 2, the technology of the present invention implements a method for generating a network threat rule based on threat intelligence, including:

s100, crawling the open source network threat information web page from the open source network threat information source through the network crawling tool.

The network crawling tool crawls the webpage codes of the open source network threat information sources, eliminates the webpage elements which are irrelevant to the open source network threat information in the webpage by checking the webpage elements, and determines the webpage elements which are relevant to the open source network threat information in the webpage.

S200, guiding the multimodal language model to convert the open source network threat information of the image class in the webpage element related to the open source network threat information into a text class through the image analysis prompt word.

In addition to carrying open source network threat intelligence for text classes, the web page elements may also contain open source network threat intelligence for image classes. The application controls the multi-modal language model to convert the image type open-source network threat information into text type through the image analysis prompt word. An exemplary image analysis prompter includes the role of a web security analysis expert dedicated to text analysis of images from threat intelligence sources. Task providing text analysis of the provided image. The task requires that the image description be given in text form, and the main focus should be to extract information from the image that helps to understand the attack. The description needs to be presented in a format that matches the content in the image. Ensuring that the output description is related to and accurately reflected by the image content and ensuring that the output description is complete. No additional information about the potential use or interpretation, suggestion or opinion of the data is added. Exemplary multimodal language models are e.g. Qwen-VL, cogVLM2, cogAgent, gpt-4o.

S300, converting the text content in the webpage element related to the open source network threat information into a unified text format to obtain the initial network threat information. In the process, a unified text format keeps the blank, paragraph and code segment of the open source network threat information content in the webpage element to keep the original layout of the text content, the unified text format marks the title of the webpage element to construct a corresponding hierarchical structure according to the content in the original HTML page by using the marked title, and the unified text format keeps the structural attribute of the HTML code containing the table and the nested list.

S400, analyzing keywords representing redundant contents in the initial network threat information titles through a language model, and eliminating repeated redundant contents in the initial network threat information according to the positions of text structure layers divided by all the target titles for the target titles with the keywords representing the redundant contents to obtain filtered network threat information.

In the implementation process, keywords representing redundant contents in each level of titles, such as abstract, brief introduction, overview and conclusion, are identified and positioned by using a language model, the level of the title where the keywords are positioned is defined, and the title of the level and the redundant contents under the level are deleted. The content corresponding to the abstract, the brief introduction, the overview and the conclusion is often a summary of more specific network threat information content, does not participate in the generation of sigma rules, and belongs to redundant information.

S500, providing the filtered network threat information to at least one intelligent agent based on the language model, wherein the intelligent agent identifies a first type entity and a second type entity from the filtered network threat information by utilizing the semantic analysis capability of the language model and the mode of voting by multiple intelligent agents, and establishes a connection. The first type of entity is the entity necessary to form the sigma rule detection query part. The first type of entity comprises an API or a process call, request parameters of the API or the process call, an intrusion index, a log source and an event source, the second type of entity provides context information of network threat information, and the second type of entity comprises titles and descriptions in sigma rules, threat technique, false alarm and threat level.

In the specific implementation process, as shown in fig. 3, step S500 includes the following steps:

S501, the filtered network threat information is divided into semantically complete blocks through a text segmentation tool, the content in each filtered network threat information block is semantically unified, and the language model is facilitated to extract the first type entity and the second type entity more accurately in a semantically block manner.

And S502, extracting prompt words from the first type entity to guide the intelligent agent to use a language model to extract the corresponding first type entity from the filtered network threat information block, voting by the intelligent agent according to the first type entity result extracted by the intelligent agent, and if the voting number of any first type entity exceeds a set vote number threshold, including the first type entity in the filtered network threat information block.

Taking API or process call as an example, for the API or process call in the first kind of entity, extracting prompt words through the API or process call to guide the language model used by the agent to extract the explicitly mentioned API call or process call from the filtered network threat information block, voting by the agent according to the extracted API or process call extraction result, and if the voting number of any API or process call exceeds a set vote number threshold, including the API or process call entity in the filtered network threat information block. The content of the API or process call extraction prompt word comprises that the control language model utilizes Get, call, request, create, register words which are related to the process call to extract the API or process call entity in a word matching mode, and further guides the language model to extract the API or process call entity in a semantic relevance analysis mode of the API or process call. In order to realize that the guide language model performs API or process calling entity extraction according to the method of the semantic relevance analysis of the API or process calling. The application constructs a corresponding knowledge base for the intelligent agent, wherein the knowledge base is a vectorized database supporting language model recognition, and the knowledge base comprises semantic descriptions of network operation and API or process calls corresponding to the semantic descriptions of the network operation. For example, the semantic description is "attacker applicable xxx download malicious load", and the semantic description may involve the API call being "url lib.request.url retriever" or "requests.get". After providing the filtered network threat information blocks for the language model of the intelligent agent, the filtered network threat information blocks are used as queries to carry out vectorization, the language model matches topk results which are most similar to the vectorized queries from a knowledge base, and candidate APIs or process calls to be voted are predicted according to topk results.

In one example, 9 agents are set, 5 is set as the vote threshold, and if 6 of the 9 agents extract GetNetUser calls from one filtered network threat intelligence block, getNetUser calls obtain a vote count of 6 exceeding the vote count threshold of 5. As another example, an agent is set to set the voting threshold to 5, the agent's process of extracting entities from a filtered network threat intelligence block is performed 9 times, and if the extraction is performed 6 times to GetNetUser calls, getNetUser calls the number of votes obtained to be 6.

In a preferred example, since the guided language model performs extraction of an API or process calling entity in a manner of API or process calling semantic relevance analysis with lower accuracy, a higher voting threshold is set to improve its accuracy when voting.

S503, the context block is built by taking the filtered network threat information blocks from which the first kind of entities are extracted as the center and combining the filtered network threat information blocks with the preset quantity before and after the text format.

And S504, extracting prompt words from the second kind of entities to guide the first kind of entities extracted by the language model reference of the intelligent agent to extract corresponding second kind of entities from the context block. The application provides a method for constructing a context block by combining a set number of filtered network threat information blocks before and after a text format, extracting the second type entity from the context block, and constructing a connection with the corresponding first type entity.

In the implementation process, the language model used by the intelligent agent can finish the extraction of the title, description, false alarm and threat level in the second kind of entity based on the semantic analysis capability. The extraction of the threat technique in the second kind of entity needs to configure a prompt dictionary in the agent, the prompt dictionary contains a mapping between API or process call and threat technique, the mapping between threat technique and API or process call, the language model used by the agent extracts threat technique labels corresponding to the API or process call according to the content of the context block and the extracted semantics, and the prompt dictionary is referred to.

In one embodiment, the second type entity is extracted through voting, and the voting principle is consistent with the voting process of the first type entity, and is not repeated.

S600, creating prompt words by utilizing sigma rules to control language models used by the intelligent agents to perform sigma rule creation based on the filtered network threat information blocks and the associated first-class entities and second-class entities extracted from the network threat information blocks.

The content of the sigma rule creation prompt word comprises:

Role as network security analysis expert, specially used for generating sigma rule from open source network threat intelligence context.

The task is that the context of the network threat information containing the following elements is converted into sigma rules, namely the analysis content describing the attack technique in the open source network threat information, the event in the open source network threat information of the log source or event source group, the first kind entity and the second kind entity related to the event;

The task requires that all provided event names, IDs, event sources and technical tactics corresponding to the events must be used, omission of any key information is forbidden, each event name only appears in one sigma rule, details which can be converted into parameters in the context of network threat information are extracted, sigma rules with the same technical tactics are combined, and the generated sigma rules are strictly formatted.

In the specific implementation process, the filtered network threat information block, the associated first type entity and second type entity extracted from the network threat information block, together with the sigma rule creation prompt word, are provided for a language model used by an intelligent agent, and the language model used by the intelligent agent is based on the filtered network threat information block, the associated first type entity and second type entity extracted from the network threat information block, and the sigma rule is primarily created according to the content of the sigma rule creation prompt word.

S700, optimizing the generated sigma rule by using a language model used by the sigma rule optimizing prompt word control agent. The content of the sigma rule optimizing prompt word comprises the detection query fields with the same detection query standard and consistent condition logic in the combined sigma rule. The detection query fields having the same detection query criteria and but inconsistent conditional logic are partitioned. Some conditions of the generated sigma rule with the same detection query standard and the same condition logic are separated into different detection query fields, and the detection query fields with the same detection query standard and the same condition logic in the sigma rule are combined through the language analysis capability of the language model used by the agent. Some detection query standards are the same in the same generated sigma rule, but the condition logic is inconsistent but divided into the same detection query fields, the detection query fields with the same detection query standards and the inconsistent condition logic in the sigma rule are segmented through the language analysis capability of the language model used by the intelligent agent, and the condition logic is adaptively adjusted.

S800, verifying the generated and optimized sigma rule by using a language model used by the sigma rule verification prompt word control agent. The content of the sigma rule verification prompt word comprises the steps of verifying whether the format of the generated and optimized sigma rule meets the requirement of the sigma rule format or not by referring to the provided sigma rule format, verifying whether metadata in the generated and optimized sigma rule meets the original open source network threat information content or not by referring to the provided open source network threat information, and verifying the accuracy of the conditions in the sigma rule by referring to the provided open source network threat information.

The language model used for the filtering task alone and the language model used by the agent may be identical or not identical. For example gml, deepseek, qwen.

Example 2

Referring to fig. 4, an embodiment of the present invention provides a threat intelligence based network threat rule generating apparatus, which includes at least one processing unit, where the processing unit is connected to a storage unit through a bus unit, and the storage unit is used as a computer readable storage medium and may be used to store a software program, a computer executable program, and a module, where the threat intelligence based network threat rule generating method in the embodiment of the present invention corresponds to the software program, the computer executable program, and the module. The processing unit executes a software program, a computer executable program and a module stored in the storage unit, so as to realize the network threat rule generation method based on threat information, and the method comprises the following steps:

Guiding a multi-mode language model to convert the open-source network threat information of the image class in the webpage element related to the open-source network threat information into a text class through the image analysis prompt word;

And utilizing a language model used by the sigma rule creation prompt word control agent to perform sigma rule creation based on the filtered network threat information block and extracting the associated first-class entity and second-class entity from the network threat information block.

Of course, the storage unit in the device for implementing the threat information-based network threat rule generation method provided by the embodiment of the invention is not limited to the above method operation, and the related operation in the threat information-based network threat rule generation method provided by any embodiment of the invention can be executed.

Example 3

The embodiment of the invention provides a computer readable storage medium, which stores a computer program, and when the computer program is executed, the method for generating the network threat rule based on threat information is realized, and comprises the following steps:

The computer readable storage medium according to the embodiment of the present invention stores a computer program not limited to the above-described method operations, but also can perform related operations in a threat intelligence-based network threat rule generation method according to any embodiment of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed structures and methods may be implemented in other manners. For example, the structural embodiments described above are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via interfaces, structures or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for generating network threat rules based on threat intelligence, comprising:

Use web crawling tools to crawl open source threat intelligence web pages from open source threat intelligence sources;

The multimodal language model is guided by image analysis prompt words to convert the image-based open source network threat intelligence in the crawled web page elements related to open source network threat intelligence into text-based open source network threat intelligence;

Convert the textual content in web page elements related to open source cyber threat intelligence into a unified text format to obtain initial cyber threat intelligence;

The language model is used to analyze keywords representing redundant content in the initial network threat intelligence titles. For target titles containing keywords representing redundant content, the redundant content in the initial network threat intelligence is removed based on the position of the target title in the text structure hierarchy of all titles to obtain filtered network threat intelligence.

The filtered network threat intelligence is provided to at least one language model-based intelligent agent, which uses the semantic analysis capability of the language model and multi-agent voting to identify first-category entities and second-category entities from the filtered network threat intelligence and establish connections. The first-category entities are necessary to form the query part of the Sigma rule detection. The first-category entities include: API or process calls, request parameters of API or process calls, intrusion indicators, log sources, and event sources. The second-category entities provide contextual information for the network threat intelligence. The second-category entities include the title and description in the Sigma rule, threat techniques and tactics, false positives, and threat levels.

The language model used to create prompt words to control the intelligent agent is based on the filtered network threat intelligence block, and the sigma rules are created based on the associated first-category entities and second-category entities extracted from the network threat intelligence block;

Utilize the Sigma rule optimization prompt word control agent to optimize the generated Sigma rule using the language model;

The generated and optimized sigma rules are verified by using the language model used by the sigma rule verification prompt words to control the intelligent agent.

2. The method for generating network threat rules based on threat intelligence according to claim 1 is characterized in that the image analysis prompt word defines a multimodal language model as a network security analysis expert, which is specifically used to perform text analysis on images from threat intelligence sources; the task of the multimodal language model is defined to provide text analysis of images; and the task requirements are defined to include: giving an image description in the form of text, presenting the image description in a format that closely matches the appearance of the image, ensuring that the output description is relevant to and accurately reflects the image content, and ensuring that the output description is complete and does not add any potential uses or explanations, suggestions or opinions about the data.

3. The method for generating network threat rules based on threat intelligence according to claim 1 is characterized in that the textual content in the web page elements related to the open source network threat intelligence is converted into a unified text format to obtain the initial network threat intelligence. The unified text format maintains the spaces, paragraphs and code segments of the open source network threat intelligence content in the web page elements to retain the original layout of the textual content; the unified text format marks the titles of the web page elements to use the marked titles to construct a corresponding hierarchical structure by maintaining the content in the original HTML page; for HTML code containing tables and nested lists, the unified text format retains their structural attributes.

4. The method for generating network threat rules based on threat intelligence according to claim 1, wherein providing the filtered network threat intelligence to at least one language model-based agent, wherein the agent utilizes the semantic analysis capability of the language model and multi-agent voting to identify the first category of entities and the second category of entities from the filtered network threat intelligence and establish a relationship therebetween comprises:

The filtered network threat intelligence is segmented into semantically complete blocks using text segmentation tools. The content within each filtered network threat intelligence block is semantically unified.

The first-category entity extraction prompt guides the language model used by the agent to extract the corresponding first-category entities from the filtered network threat intelligence block. The agent votes based on the first-category entity results it has extracted. If the number of votes for any first-category entity exceeds the set vote threshold, the first-category entity is included in the filtered network threat intelligence block.

Centering on the filtered cyber threat intelligence block that extracts the first type of entity, a context block is constructed by combining a set number of filtered cyber threat intelligence blocks before and after the text format;

The second-category entity extraction prompt words are used to guide the language model used by the intelligent agent to refer to the extracted first-category entities to extract corresponding second-category entities from the context block and build connections between them and the corresponding first-category entities.

5. The method for generating network threat rules based on threat intelligence according to claim 4 is characterized in that, for API or process calls in the first category of entities, the language model used by the intelligent agent is guided by API or process call extraction prompt words to extract the explicitly mentioned API calls or process calls from the filtered network threat intelligence block, and the intelligent agent votes based on the API or process call extraction results extracted by itself. If the number of votes for any API or process call exceeds a set vote threshold, the API or process call entity is included in the filtered network threat intelligence block; wherein the content of the API or process call extraction prompt words includes: controlling the language model used by the intelligent agent to use the words involved in the API or process call to extract the API or process call entity in a word matching manner; and further guiding the language model used by the intelligent agent to extract the API or process call entity in a way of API or process call semantic relevance analysis.

6. The method for generating network threat rules based on threat intelligence according to claim 5 is characterized in that, in order to guide the language model used by the intelligent agent to extract API or process call entities in accordance with the API or process call semantic correlation analysis, a corresponding knowledge base is constructed for the intelligent agent, and the knowledge base is a vectorized database that supports the recognition of the language model used by the intelligent agent, and the knowledge base includes semantic descriptions of network operations and API or process calls corresponding to the semantic descriptions of network operations; after providing the filtered network threat intelligence block to the language model used by the intelligent agent, the filtered network threat intelligence block is vectorized as a query, and the language model used by the intelligent agent matches the topk results most similar to the vectorized query from the knowledge base, and predicts the candidate API or process calls to be voted based on the topk results.

7. The method for generating network threat rules based on threat intelligence according to claim 5 is characterized in that the extraction of threat techniques and tactics from the second type of entities requires configuring a prompt dictionary in the intelligent agent, wherein the prompt dictionary contains a mapping between API or process calls and threat techniques and tactics in the first type of entities, and a mapping between threat techniques and tactics and API or process calls. The language model used by the intelligent agent extracts the threat technique and tactic labels corresponding to the API or process call with reference to the prompt dictionary based on the content of the context block and the extracted semantics.

8. The method for generating network threat rules based on threat intelligence according to claim 1 is characterized in that the content of the Sigma rule creation prompt words includes: defining a role: as a network security analysis expert, specifically used to generate Sigma rules from the context of open source network threat intelligence; defining a task: converting the network threat intelligence context containing the following elements into Sigma rules: analysis content describing attack techniques and tactics in open source network threat intelligence; events in open source network threat intelligence grouped by log sources or event sources, and the first and second category entities involved in the events; defining task requirements: all provided event names, IDs, event sources and techniques and tactics corresponding to the events must be used; no key information is allowed to be omitted; each event name only appears in one Sigma rule; extracting details that can be converted into parameters in the network threat intelligence context; merging Sigma rules with the same techniques and tactics; the generated Sigma rules are strictly formatted.

9. The method for generating network threat rules based on threat intelligence according to claim 1 is characterized in that, in the process of optimizing the generated sigma rules using the language model used by the sigma rule optimization prompt word control agent, the content of the sigma rule optimization prompt word used includes: merging detection query fields with the same detection query criteria and consistent conditional logic in the sigma rule; and splitting detection query fields with the same detection query criteria but inconsistent conditional logic.

10. The method for generating network threat rules based on threat intelligence according to claim 1 is characterized in that, in the process of using the sigma rule verification prompt word to control the language model used by the intelligent agent to verify the generated and optimized sigma rules, the content of the sigma rule verification prompt word used includes: referring to the provided sigma rule format to verify whether the format of the generated and optimized sigma rule is true and meets the sigma rule format requirements; referring to the provided open source network threat intelligence to verify whether the metadata in the generated and optimized sigma rule meets the original open source network threat intelligence content; and referring to the provided open source network threat intelligence to verify the accuracy of the conditions in the sigma rule.