CN116095035A

CN116095035A - Method and device for acquiring context information of mailbox

Info

Publication number: CN116095035A
Application number: CN202211686067.6A
Authority: CN
Inventors: 白敏�; 黄朝文; 李敏; 汪列军; 王胜利; 万文杰
Original assignee: Secworld Information Technology Beijing Co Ltd; Qax Technology Group Inc
Current assignee: Secworld Information Technology Beijing Co Ltd; Qax Technology Group Inc
Priority date: 2022-12-27
Filing date: 2022-12-27
Publication date: 2023-05-09

Abstract

This application provides a method and device for obtaining mailbox context information. The method for obtaining mailbox context information includes: obtaining the mailbox entity to be identified; matching the mailbox entity to be identified with the mailbox entity in the mailbox reputation information database, and the mailbox reputation information The library contains various mailbox entities and their context information. The content of the context information of a mailbox entity in the mailbox reputation intelligence database is more than that extracted from a mailbox entity alone; if the match is successful, the mailbox will be output The mailbox entity and its context information that match the mailbox entity to be identified in the reputation intelligence database; if the match fails, the output is empty. Analysts can efficiently and accurately trace the source of mailbox entities to be identified based on the rich contextual information obtained.

Description

Method and device for obtaining mailbox context information

技术领域Technical Field

本申请涉及网络安全技术领域，尤其涉及一种邮箱上下文信息的获取方法及装置，以及邮箱信誉情报库的生成方法及装置。The present application relates to the field of network security technology, and in particular to a method and device for acquiring mailbox context information, and a method and device for generating a mailbox reputation intelligence library.

背景技术Background Art

在安全分析场景中，分析师需要通过信誉研判工具追踪定位攻击实体。而攻击实体最为典型的就是邮箱实体类攻击实体。因为大部分攻击者会经常使用钓鱼邮箱实体、带恶意的外联邮箱实体等进行攻击并获取用户数据。因此，对邮箱实体进行信誉研判就成为了安全分析中的重要一环。In security analysis scenarios, analysts need to track and locate attack entities through reputation analysis tools. The most typical attack entity is the mailbox entity. Because most attackers often use phishing mailbox entities, malicious external mailbox entities, etc. to attack and obtain user data. Therefore, reputation analysis of mailbox entities has become an important part of security analysis.

目前，为了实现对邮箱实体进行信誉研判，主要采用的方式为：分析师将待研判的邮箱实体输入到信誉研判工具中，信誉研判工具对该邮箱实体进行分析，输出分析结果。这个分析结果一般为该邮箱实体是否为恶意邮箱实体以及从该邮箱实体中提取出的一些基础的上下文信息。如果分析结果表明该邮箱实体为恶意邮箱实体，接下来，分析师就需要基于这些上下文信息进行追踪溯源。At present, in order to achieve reputation analysis of mailbox entities, the main method used is: the analyst inputs the mailbox entity to be analyzed into the reputation analysis tool, and the reputation analysis tool analyzes the mailbox entity and outputs the analysis result. This analysis result is generally whether the mailbox entity is a malicious mailbox entity and some basic context information extracted from the mailbox entity. If the analysis result shows that the mailbox entity is a malicious mailbox entity, then the analyst needs to track and trace the source based on this context information.

然而，通过上述方式对恶意邮箱实体进行追踪溯源，由于信誉研判工具提供的该邮箱实体的上下文信息都是一些比较基础的信息，分析师基于这些基础的上下文信息并不能够十分准确的进行追踪溯源，有时甚至还需要分析师基于该恶意邮箱实体人工进行更多维度的相关信息的关联，这样，就会降低分析师追踪溯源的效率和准确性。However, when tracing the malicious email entity through the above method, since the context information of the email entity provided by the reputation analysis tool is relatively basic information, analysts cannot accurately trace the source based on this basic context information. Sometimes, analysts are even required to manually associate more dimensional related information based on the malicious email entity, which will reduce the efficiency and accuracy of analysts' tracking and tracing.

发明内容Summary of the invention

本申请实施例的目的是提供一种邮箱上下文信息的获取方法及装置，以及邮箱信誉情报库的生成方法及装置，以提高分析师追踪溯源的效率和准确性。The purpose of the embodiments of the present application is to provide a method and device for obtaining mailbox context information, as well as a method and device for generating a mailbox reputation intelligence library, so as to improve the efficiency and accuracy of analysts' tracking and tracing.

为解决上述技术问题，本申请实施例提供如下技术方案：In order to solve the above technical problems, the embodiments of the present application provide the following technical solutions:

本申请第一方面提供一种邮箱上下文信息的获取方法，所述方法包括：获取待识别的邮箱实体；将所述待识别的邮箱实体与邮箱信誉情报库中的邮箱实体进行匹配，所述邮箱信誉情报库中包含有各种邮箱实体及其上下文信息，所述邮箱信誉情报库中一种邮箱实体的上下文信息的内容多于单独从所述一种邮箱实体中提取的上下文信息的内容；若匹配成功，则输出所述邮箱信誉情报库中与所述待识别的邮箱实体匹配的邮箱实体及其上下文信息；若匹配失败，则输出为空。In a first aspect, the present application provides a method for obtaining mailbox context information, the method comprising: obtaining a mailbox entity to be identified; matching the mailbox entity to be identified with a mailbox entity in a mailbox reputation intelligence library, wherein the mailbox reputation intelligence library contains various mailbox entities and their context information, and the content of the context information of one mailbox entity in the mailbox reputation intelligence library is more than the content of the context information extracted from the one mailbox entity alone; if the match is successful, outputting the mailbox entity and its context information in the mailbox reputation intelligence library that match the mailbox entity to be identified; if the match fails, the output is empty.

本申请第二方面提供一种邮箱信誉情报库的生成方法，所述方法包括：通过多种信息源采集多个邮箱实体；分别从所述多个邮箱实体中提取上下文信息，所述上下文信息与邮箱实体信誉研判相关；将同一邮箱实体的上下文信息进行整合；对整合后的上下文信息进行信誉研判，生成各个邮箱实体的研判结果；将各个邮箱实体的研判结果加入到相应的整合后的上下文信息中并存储于邮箱信誉情报库。A second aspect of the present application provides a method for generating an email reputation intelligence database, the method comprising: collecting multiple email entities through multiple information sources; extracting context information from the multiple email entities respectively, the context information being related to the email entity reputation assessment; integrating the context information of the same email entity; performing reputation assessment on the integrated context information to generate assessment results for each email entity; adding the assessment results for each email entity to the corresponding integrated context information and storing it in the email reputation intelligence database.

本申请第三方面提供一种邮箱上下文信息的获取装置，所述装置包括：接收模块，用于获取待识别的邮箱实体；匹配模块，用于将所述待识别的邮箱实体与邮箱信誉情报库中的邮箱实体进行匹配，所述邮箱信誉情报库中包含有各种邮箱实体及其上下文信息，所述邮箱信誉情报库中一种邮箱实体的上下文信息的内容多于单独从所述一种邮箱实体中提取的上下文信息的内容；若匹配成功，则进入输出模块，所述输出模块，用于输出所述邮箱信誉情报库中与所述待识别的邮箱实体匹配的邮箱实体及其上下文信息；若匹配成功，则进入新增模块，所述新增模块，用于提取所述待识别的邮箱实体中的上下文信息，并将提取的上下文信息和所述待识别的邮箱实体存储于所述邮箱信誉情报库。According to a third aspect of the present application, there is provided a device for acquiring mailbox context information, the device comprising: a receiving module, for acquiring a mailbox entity to be identified; a matching module, for matching the mailbox entity to be identified with a mailbox entity in a mailbox reputation intelligence library, wherein the mailbox reputation intelligence library contains various mailbox entities and their context information, and the content of the context information of one mailbox entity in the mailbox reputation intelligence library is more than the content of the context information extracted from the one mailbox entity alone; if the match is successful, the device enters an output module, wherein the output module is used to output the mailbox entity and its context information that match the mailbox entity to be identified in the mailbox reputation intelligence library; if the match is successful, the device enters a new module, wherein the new module is used to extract the context information in the mailbox entity to be identified, and store the extracted context information and the mailbox entity to be identified in the mailbox reputation intelligence library.

本申请第四方面提供一种邮箱信誉情报库的生成装置，所述装置包括：采集模块，用于通过多种信息源采集多个邮箱实体；提取模块，用于分别从所述多个邮箱实体中提取上下文信息，所述上下文信息与邮箱实体信誉研判相关；整合模块，用于将同一邮箱实体的上下文信息进行整合；研判模块，用于对整合后的上下文信息进行信誉研判，生成各个邮箱实体的研判结果；存储模块，用于将各个邮箱实体的研判结果加入到相应的整合后的上下文信息中并存储于邮箱信誉情报库。In a fourth aspect, the present application provides a device for generating a mailbox reputation intelligence library, the device comprising: a collection module, used to collect multiple mailbox entities through multiple information sources; an extraction module, used to extract context information from the multiple mailbox entities respectively, wherein the context information is related to the reputation assessment of the mailbox entities; an integration module, used to integrate the context information of the same mailbox entity; a assessment module, used to perform reputation assessment on the integrated context information and generate assessment results for each mailbox entity; a storage module, used to add the assessment results of each mailbox entity to the corresponding integrated context information and store it in the mailbox reputation intelligence library.

本申请第五方面提供一种计算机存储介质，其上存储有计算机程序，所述程序被处理器执行时可实现第一方面或第二方面中的方法。A fifth aspect of the present application provides a computer storage medium having a computer program stored thereon, wherein the program, when executed by a processor, can implement the method in the first aspect or the second aspect.

本申请第六方面提供一种电子设备，包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述处理器执行所述程序时可实现第一方面或第二方面中的方法。In a sixth aspect, the present application provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor can implement the method in the first aspect or the second aspect when executing the program.

相较于现有技术，本申请第一方面提供的邮箱上下文信息的获取方法，当需要获取待识别邮箱实体的上下文信息时，通过包含有各种邮箱实体及其丰富的上下文信息的邮箱信誉情报库，能够从邮箱信誉情报库中匹配出与待识别邮箱实体相同的邮箱实体，进而将该邮箱实体的上下文信息作为待识别邮箱实体的上下文信息，并提供给分析师，使得分析师能够获得待识别邮箱实体的更为丰富的上下文信息，进而基于这些丰富的上下文信息能够高效准确的对待识别邮箱实体进行追踪溯源。Compared with the prior art, the method for obtaining email context information provided in the first aspect of the present application, when it is necessary to obtain the context information of the email entity to be identified, can match the email entity that is the same as the email entity to be identified from the email reputation intelligence library through the email reputation intelligence library containing various email entities and their rich context information, and then use the context information of the email entity as the context information of the email entity to be identified and provide it to the analyst, so that the analyst can obtain richer context information of the email entity to be identified, and then based on the rich context information, can efficiently and accurately track and trace the email entity to be identified.

本申请第二方面提供的邮箱信誉情报库的生成方法、第三方面提供的邮箱上下文信息的获取装置、第四方面提供的邮箱信誉情报库的生成装置、第五方面提供的计算机存储介质、第六方面提供的电子设备，与第一方面提供的邮箱信誉情报库的生成方法具有相同或相似的有益效果。The method for generating the mailbox reputation intelligence base provided in the second aspect of the present application, the device for obtaining mailbox context information provided in the third aspect, the device for generating the mailbox reputation intelligence base provided in the fourth aspect, the computer storage medium provided in the fifth aspect, and the electronic device provided in the sixth aspect have the same or similar beneficial effects as the method for generating the mailbox reputation intelligence base provided in the first aspect.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

通过参考附图阅读下文的详细描述，本申请示例性实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中，以示例性而非限制性的方式示出了本申请的若干实施方式，相同或对应的标号表示相同或对应的部分，其中：By reading the detailed description below with reference to the accompanying drawings, the above and other purposes, features and advantages of the exemplary embodiments of the present application will become easy to understand. In the accompanying drawings, several embodiments of the present application are shown in an exemplary and non-limiting manner, and the same or corresponding reference numerals represent the same or corresponding parts, wherein:

图1为本申请实施例中邮箱上下文信息的获取方法的流程示意图；FIG1 is a schematic diagram of a flow chart of a method for obtaining mailbox context information in an embodiment of the present application;

图2为本申请实施例中邮箱信誉情报的生产处理架构示意图；FIG2 is a schematic diagram of a production and processing architecture of mailbox reputation intelligence in an embodiment of the present application;

图3为本申请实施例中邮箱信誉情报库的生成方法的流程示意图；FIG3 is a flow chart of a method for generating a mailbox reputation intelligence database in an embodiment of the present application;

图4为本申请实施例中提取的邮箱实体的上下文信息示意图；FIG4 is a schematic diagram of context information of a mailbox entity extracted in an embodiment of the present application;

图5为本申请实施例中输出的邮箱实体的上下文信息示意图；FIG5 is a schematic diagram of context information of a mailbox entity output in an embodiment of the present application;

图6为本申请实施例中邮箱信誉情报库的生产流程示意图；FIG6 is a schematic diagram of the production process of the mailbox reputation intelligence database in an embodiment of the present application;

图7为本申请实施例中邮箱信誉情报库的部署流程示意图；FIG. 7 is a schematic diagram of the deployment process of the mailbox reputation intelligence library in an embodiment of the present application;

图8为本申请实施例中邮箱上下文信息的获取装置的结构示意图；FIG8 is a schematic diagram of the structure of a device for acquiring mailbox context information in an embodiment of the present application;

图9为本申请实施例中邮箱信誉情报库的生成装置的结构示意图。FIG. 9 is a schematic diagram of the structure of a device for generating a mailbox reputation intelligence database in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将参照附图更详细地描述本申请的示例性实施方式。虽然附图中显示了本申请的示例性实施方式，然而应当理解，可以以各种形式实现本申请而不应被这里阐述的实施方式所限制。相反，提供这些实施方式是为了能够更透彻地理解本申请，并且能够将本申请的范围完整的传达给本领域的技术人员。The exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. Although the exemplary embodiments of the present application are shown in the accompanying drawings, it should be understood that the present application can be implemented in various forms and should not be limited by the embodiments described herein. On the contrary, these embodiments are provided in order to enable a more thorough understanding of the present application and to fully convey the scope of the present application to those skilled in the art.

需要注意的是，除非另有说明，本申请使用的技术术语或者科学术语应当为本申请所属领域技术人员所理解的通常意义。It should be noted that, unless otherwise specified, the technical terms or scientific terms used in this application should have the common meanings understood by technicians in the field to which this application belongs.

目前，分析师为了获得目标邮箱的上下文信息，一般来说，都是通过人工或者现有的邮箱信誉服务以获取目标邮箱的上下文信息。但是，通过人工获取目标邮箱的上下文信息，受限于分析师个人的经验差异，获取上下文信息的效率不仅较低，获取的上下文信息的准确性以及全面性也有所差异。而通过邮箱信誉服务获取的上下文信息也是一些比较基础的信息，无法帮助分析师对目标邮箱进行精准追踪溯源。可见，采用当前的方式获取邮箱的上下文信息，并基于获取的上下文信息进行追踪溯源，会降低分析师追踪溯源的效率和准确性。At present, in order to obtain the context information of the target mailbox, analysts generally obtain the context information of the target mailbox manually or through existing mailbox reputation services. However, the efficiency of obtaining the context information of the target mailbox manually is limited by the individual experience differences of analysts. The accuracy and comprehensiveness of the obtained context information are not only low. The context information obtained through the mailbox reputation service is also some relatively basic information, which cannot help analysts accurately track and trace the target mailbox. It can be seen that the current method of obtaining the context information of the mailbox and tracking and tracing based on the obtained context information will reduce the efficiency and accuracy of the analyst's tracking and tracing.

发明人经过深入研究发现，导致分析师追踪溯源的效率和准确性低下的主要原因在于获取的邮箱的上下文信息不够丰富，而获取的邮箱的上下文信息不够丰富的原因在于提供邮箱的上下文信息的邮箱信誉情报库中邮箱上下文信息不够丰富。而如果能够提供一个邮箱信誉情报库，在该邮箱信誉情报库中包含有各种邮箱的大量的上下文信息，那么，基于该邮箱信誉情报库就能够查找到目标邮箱的丰富的上下文信息，进而基于这些丰富的上下文信息就能够帮助分析师更加高效精准的对目标邮箱进行追踪溯源。After in-depth research, the inventor found that the main reason for the low efficiency and accuracy of analysts' tracing and tracing is that the context information of the mailbox obtained is not rich enough, and the reason for the lack of rich context information of the mailbox is that the mailbox context information in the mailbox reputation intelligence library that provides the context information of the mailbox is not rich enough. If a mailbox reputation intelligence library can be provided, which contains a large amount of context information of various mailboxes, then based on the mailbox reputation intelligence library, it is possible to find rich context information of the target mailbox, and then based on this rich context information, it can help analysts track and trace the target mailbox more efficiently and accurately.

有鉴于此，本申请实施例提供一种邮箱上下文信息的获取方法及装置，以及邮箱信誉情报库的生成方法及装置，通过建立一个包含有各种邮箱的丰富的上下文信息的邮箱信誉情报库，当分析师需要对某一个邮箱进行追踪溯源时，可以通过该邮箱信息情报库获取该邮箱丰富的上下文信息，进而基于该丰富的上下文信息就能够快速准确的对目标邮箱进行追踪溯源。In view of this, the embodiments of the present application provide a method and device for obtaining mailbox context information, as well as a method and device for generating a mailbox reputation intelligence library. By establishing a mailbox reputation intelligence library that contains rich context information of various mailboxes, when an analyst needs to track and trace a certain mailbox, the rich context information of the mailbox can be obtained through the mailbox information intelligence library, and then based on the rich context information, the target mailbox can be tracked and traced quickly and accurately.

接下来，首先对本申请实施例提供的邮箱上下文信息的获取方法进行详细说明。Next, the method for obtaining mailbox context information provided by the embodiment of the present application is first described in detail.

图1为本申请实施例中邮箱上下文信息的获取方法的流程示意图，参见图1所示，该方法可以包括：FIG. 1 is a flow chart of a method for obtaining mailbox context information in an embodiment of the present application. Referring to FIG. 1 , the method may include:

S101：获取待识别的邮箱实体。S101: Obtain a mailbox entity to be identified.

待识别的邮箱实体实际上就是分析师需要进行追踪溯源，即需要获取其上下文信息的邮箱实体。而邮箱实体实际上就是邮箱账号，例如：zhangsan@163.com，或者一封邮件。The mailbox entity to be identified is actually the mailbox entity that the analyst needs to trace and trace, that is, the mailbox entity whose context information needs to be obtained. The mailbox entity is actually an email account, such as zhangsan@163.com, or an email.

S102：将待识别的邮箱实体与邮箱信誉情报库中的邮箱实体进行匹配。若匹配成功，则执行步骤S103，若匹配失败，则执行步骤S104。S102: Match the mailbox entity to be identified with the mailbox entity in the mailbox reputation intelligence database. If the match is successful, execute step S103; if the match fails, execute step S104.

其中，邮箱信誉情报库中包含有各种邮箱实体及其上下文信息，邮箱信誉情报库中一种邮箱实体的上下文信息的内容多于单独从一种邮箱实体中提取的上下文信息的内容。The mailbox reputation intelligence database contains various mailbox entities and their context information. The content of the context information of one mailbox entity in the mailbox reputation intelligence database is greater than the content of the context information extracted from one mailbox entity alone.

为了获取上述邮箱实体丰富的上下文信息，单单基于邮箱实体很难获取的全面，因此，可以从包含有各种邮箱实体及其丰富的上下文信息的邮箱信誉情报库中获取上述邮箱实体的上下文信息，获取的上下文信息就非常丰富。In order to obtain rich contextual information of the above-mentioned mailbox entity, which is difficult to obtain comprehensively based on the mailbox entity alone, the contextual information of the above-mentioned mailbox entity can be obtained from the mailbox reputation intelligence library containing various mailbox entities and their rich contextual information, and the obtained contextual information is very rich.

从邮箱信誉情报库中能够获得上述邮箱实体的丰富的上下文信息，主要是因为：邮箱信誉情报库中存储有从各种信息源采集的邮箱实体，从这些邮箱实体中提取出的上下文信息就比较丰富。举例来说，假设从A信息源采集到邮箱实体1和邮箱实体2，从邮箱实体1中提取出上下文信息a和上下文信息b，从邮箱实体2中提取出上下文信息c。从B信息源采集到邮箱实体1，从邮箱实体1中提取出上下文信息d。那么，邮箱信誉情报库中就存储有邮箱实体1的上下文信息a、上下文信息b和上下文信息d，邮箱实体2的上下文信息c。当分析师需要对邮箱实体1进行追踪溯源时，如果只基于其手头上的邮箱实体1获取其上下文信息，可能只能够获取上下文信息a和上下文信息b。而如果将其手头的邮箱实体1与邮箱信誉情报库中的各邮箱实体进行匹配，不仅能够获得上下文信息a和上下文信息b，还能够获得上下文信息c。这样，分析师获得的邮箱实体1的上下文信息就更加丰富，进而能够更加高效准确地进行追踪溯源。而上述邮箱信誉情报库具体的获取的方式，可以将现有的邮箱信息情报库进行整合，也可以重新自行创建(具体的创建方式后文将详细说明)，此处不做限定。The reason why rich context information of the above-mentioned mailbox entities can be obtained from the mailbox reputation intelligence database is mainly because: the mailbox reputation intelligence database stores mailbox entities collected from various information sources, and the context information extracted from these mailbox entities is relatively rich. For example, suppose that mailbox entity 1 and mailbox entity 2 are collected from information source A, context information a and context information b are extracted from mailbox entity 1, and context information c is extracted from mailbox entity 2. Mailbox entity 1 is collected from information source B, and context information d is extracted from mailbox entity 1. Then, the mailbox reputation intelligence database stores context information a, context information b and context information d of mailbox entity 1, and context information c of mailbox entity 2. When an analyst needs to track and trace the source of mailbox entity 1, if he only obtains its context information based on the mailbox entity 1 at hand, he may only be able to obtain context information a and context information b. However, if the mailbox entity 1 at hand is matched with each mailbox entity in the mailbox reputation intelligence database, not only context information a and context information b can be obtained, but also context information c can be obtained. In this way, the analyst obtains richer context information of the mailbox entity 1, and can track and trace the source more efficiently and accurately. The specific method of obtaining the above-mentioned mailbox reputation intelligence library can be to integrate the existing mailbox information intelligence library or to create it again (the specific creation method will be described in detail later), which is not limited here.

在将待识别的邮箱实体与邮箱信誉情报库中的邮箱实体进行匹配的过程中，可以将待识别的邮箱实体与邮箱信誉情报库中的每一个邮箱实体依次进行对比，如果发现待识别的邮箱实体与邮箱信誉情报库中的某一个邮箱实体相同，就认为待识别的邮箱实体与邮箱信誉情报库中的该邮箱实体匹配成功，进而不再与该邮箱实体后面的邮箱实体继续进行匹配，这样能够提高匹配效率，进而提高邮箱上下文信息获取的效率，提高分析师追踪溯源的效率。而如果直到邮箱信誉情报库中的最后一个邮箱实体都没有与待识别的邮箱实体匹配成功，就认为邮箱信誉情报库中没有与待识别的邮箱实体匹配的邮箱实体，进而该邮箱信誉情报库中就不会有待识别的邮箱实体丰富的上下文信息，分析师就只能够通过其他方式获取该邮箱实体的上下文信息了。In the process of matching the mailbox entity to be identified with the mailbox entity in the mailbox reputation intelligence library, the mailbox entity to be identified can be compared with each mailbox entity in the mailbox reputation intelligence library in turn. If it is found that the mailbox entity to be identified is the same as a mailbox entity in the mailbox reputation intelligence library, it is considered that the mailbox entity to be identified is successfully matched with the mailbox entity in the mailbox reputation intelligence library, and then the mailbox entity behind the mailbox entity is no longer matched. This can improve the matching efficiency, and then improve the efficiency of obtaining mailbox context information, and improve the efficiency of analyst tracking and tracing. If there is no successful match with the mailbox entity to be identified until the last mailbox entity in the mailbox reputation intelligence library, it is considered that there is no mailbox entity in the mailbox reputation intelligence library that matches the mailbox entity to be identified, and then the mailbox reputation intelligence library will not have rich context information of the mailbox entity to be identified, and the analyst can only obtain the context information of the mailbox entity through other means.

S103：输出邮箱信誉情报库中与待识别的邮箱实体匹配的邮箱实体及其上下文信息。S103: Outputting the mailbox entity and its context information that matches the mailbox entity to be identified in the mailbox reputation intelligence database.

当待识别的邮箱实体与邮箱信誉情报库中的某一个邮箱实体匹配成功时，将邮箱信誉情报库中这一个邮箱实体的上下文信息作为待识别的邮箱实体的上下文信息输出给分析师，由于邮箱信誉情报库中这一个邮箱实体的上下文信息相比于从待识别的邮箱实体中提取出的上下文信息更加全面丰富，因此，分析师基于邮箱信誉情报库中与待识别的邮箱实体匹配的邮箱实体的上下文信息能够更加高效精准的对待识别邮箱进行追踪溯源。When the mailbox entity to be identified successfully matches with a mailbox entity in the mailbox reputation intelligence library, the context information of this mailbox entity in the mailbox reputation intelligence library is output to the analyst as the context information of the mailbox entity to be identified. Since the context information of this mailbox entity in the mailbox reputation intelligence library is more comprehensive and rich than the context information extracted from the mailbox entity to be identified, the analyst can track and trace the mailbox to be identified more efficiently and accurately based on the context information of the mailbox entity in the mailbox reputation intelligence library that matches the mailbox entity to be identified.

S104：输出为空。S104: The output is empty.

当待识别的邮箱实体与邮箱信誉情报库中的所有邮箱实体均没有匹配成功时，说明邮箱信誉情报库当前无法提供该邮箱实体丰富的上下文信息，因此，只能够输出为空。When the mailbox entity to be identified does not successfully match any mailbox entity in the mailbox reputation intelligence library, it means that the mailbox reputation intelligence library is currently unable to provide rich context information for the mailbox entity, and therefore, can only output empty.

与此同时，为了弥补邮箱信誉情报库中的缺失，后续还需要继续生产新的邮箱信誉情报。当邮箱信誉情报库中生产出新的邮箱信誉情报后，还可以将之前输出为空的邮箱实体再次在邮箱信誉情报库中进行匹配，如果匹配成功，可以向之前查找该邮箱实体的分析师发出提示信息，以提示该分析师邮箱信誉情报库中已有与其查询的邮箱实体相应的上下文信息。At the same time, in order to make up for the lack of mailbox reputation intelligence database, new mailbox reputation intelligence needs to be produced in the future. When new mailbox reputation intelligence is produced in the mailbox reputation intelligence database, the mailbox entity that was previously output as empty can be matched again in the mailbox reputation intelligence database. If the match is successful, a prompt message can be sent to the analyst who previously searched for the mailbox entity to prompt the analyst that the mailbox reputation intelligence database already has the context information corresponding to the mailbox entity he queried.

这里需要说明的是，上述的上下文信息可以是指与邮箱实体相关的所有信息，不仅可以包括发件人、主题等，还可以包括邮箱的信誉评价结果。对于上述的上下文信息的具体内容，此处不做限定。It should be noted that the above context information may refer to all information related to the mailbox entity, including not only the sender, subject, etc., but also the reputation evaluation result of the mailbox. The specific content of the above context information is not limited here.

由上述内容可知，本申请实施例提供的邮箱上下文信息的获取方法，当需要获取待识别邮箱实体的上下文信息时，通过包含有各种邮箱实体及其丰富的上下文信息的邮箱信誉情报库，能够从邮箱信誉情报库中匹配出与待识别邮箱实体相同的邮箱实体，进而将该邮箱实体的上下文信息作为待识别邮箱实体的上下文信息，并提供给分析师，使得分析师能够获得待识别邮箱实体的更为丰富的上下文信息，进而基于这些丰富的上下文信息能够高效准确的对待识别邮箱实体进行追踪溯源。From the above content, it can be seen that the method for obtaining mailbox context information provided in the embodiment of the present application, when it is necessary to obtain the context information of the mailbox entity to be identified, can match the mailbox entity that is the same as the mailbox entity to be identified from the mailbox reputation intelligence library through the mailbox reputation intelligence library that contains various mailbox entities and their rich context information, and then use the context information of the mailbox entity as the context information of the mailbox entity to be identified, and provide it to the analyst, so that the analyst can obtain richer context information of the mailbox entity to be identified, and then based on this rich context information, can efficiently and accurately track and trace the mailbox entity to be identified.

进一步地，作为对上述邮箱信誉情报库的细化和扩展，接下来详细对邮箱信誉情报库的创建过程进行说明。Furthermore, as a refinement and extension of the above-mentioned mailbox reputation intelligence database, the creation process of the mailbox reputation intelligence database is described in detail below.

在此之前，先对邮箱信誉情报库中各邮箱信誉情报的生产处理架构进行说明。图2为本申请实施例中邮箱信誉情报的生产处理架构示意图，参见图2所示，该架构可以包括：数据接入层、数据存储层、数据处理层和应用层。Prior to this, the production and processing architecture of each mailbox reputation intelligence in the mailbox reputation intelligence library is described. Figure 2 is a schematic diagram of the production and processing architecture of mailbox reputation intelligence in an embodiment of the present application. As shown in Figure 2, the architecture may include: data access layer, data storage layer, data processing layer and application layer.

1、数据接入层：接入多个类型的信息源，此时各信息源还不是邮箱信誉情报，是综合研判分析的对象，即待进行提取的对象实体。多个类型的信息源可以包括但不限于：开源数据、域名查询协议(Whois)记录、失陷攻击指标(Indicator Of Compromise，IOC)信息、威胁情报(Threat Intelligence，TI)数据、大网爬虫数据和黑/白名单等信息。1. Data access layer: access multiple types of information sources. At this time, each information source is not yet the mailbox reputation intelligence, but the object of comprehensive research and analysis, that is, the object entity to be extracted. Multiple types of information sources can include but are not limited to: open source data, domain name query protocol (Whois) records, indicator of compromise (IOC) information, threat intelligence (TI) data, large web crawler data and black/white list information.

数据接入层具体可以包括：Whois信息模块、开源信息模块、爬虫信息模块、IOC信息模块、黑/白名单模块和文件鉴定信息模块。The data access layer may specifically include: Whois information module, open source information module, crawler information module, IOC information module, black/white list module and file identification information module.

(1)Whois信息模块：用于接收Whois记录。(1) Whois information module: used to receive Whois records.

(2)开源信息模块：用于接收开源数据。(2) Open source information module: used to receive open source data.

(3)爬虫信息模块：用于接收大网爬虫数据。(3) Crawler information module: used to receive large-scale crawler data.

(4)IOC信息模块：用于接收IOC信息。(4) IOC information module: used to receive IOC information.

(5)黑/白名单模块：用于接收邮箱的黑/白名单。(5) Black/white list module: used to receive the black/white list of mailboxes.

(6)文件鉴定信息模块：用于接收邮箱中文件的鉴定信息。(6) File identification information module: used to receive identification information of files in the mailbox.

2、数据存储层：针对邮箱实体的各类上下文信息进行存储入库，将各种信息进行关联后进入生产流程，具体通过缓存队列任务处理后入库。2. Data storage layer: Store various context information of mailbox entities into the warehouse, associate various information and enter the production process, and specifically process them through cache queue tasks and then put them into the warehouse.

数据存储层具体可以包括：数据存储模块、邮箱信誉等级评价模块和实施动态信誉数据模块。The data storage layer may specifically include: a data storage module, a mailbox reputation rating evaluation module, and a dynamic reputation data implementation module.

(1)数据存储模块：用于存储邮箱实体的各类上下文信息。(1) Data storage module: used to store various context information of mailbox entities.

(2)邮箱信誉等级评价模块：用于根据邮箱实体的各类上下文信息评价邮箱实体的信誉等级。(2) Mailbox reputation rating evaluation module: used to evaluate the reputation level of the mailbox entity based on various context information of the mailbox entity.

(3)实时动态信誉数据模块：用于实时动态更新邮箱实体的信誉等级。(3) Real-time dynamic reputation data module: used to dynamically update the reputation level of the mailbox entity in real time.

3、数据处理层：根据实体抽取规则提取不同类型和来源的数据进行整合、关联、入库等操作。3. Data processing layer: extract data of different types and sources according to entity extraction rules for integration, association, warehousing and other operations.

数据处理曾具体可以包括：任务调度模块、数据解析模块、数据抽取模块、数据转换模块和规则定义模块。Data processing can specifically include: task scheduling module, data analysis module, data extraction module, data conversion module and rule definition module.

(1)任务调度模块：用于实现数据解析模块、数据抽取模块和数据转换模块之间的调度。(1) Task scheduling module: used to implement the scheduling among data parsing module, data extraction module and data conversion module.

(2)数据解析模块：用于对不同类型和来源的数据进行解析。(2) Data parsing module: used to parse data of different types and sources.

(3)数据抽取模块：用于按照实体抽取规则从不同类型和来源的数据中抽取邮箱实体的上下文信息。(3) Data extraction module: used to extract the context information of the mailbox entity from data of different types and sources according to the entity extraction rules.

(4)数据转换模块：用于将抽取的邮箱实体的上下文信息转换为预设格式。(4) Data conversion module: used to convert the context information of the extracted mailbox entity into a preset format.

(5)规则定义模块：用于定义实体抽取规则。(5) Rule definition module: used to define entity extraction rules.

4、应用层：通过预置的邮箱白名单，进行邮箱研判结果的校验，进行邮箱实体信誉画像等，以及提供信誉查询服务等。完成数据的接口字段封装及应用程序界面(Application Program Interface，API)接口提供，并对外提供API或应用调用等，实现威胁分析价值。4. Application layer: Verify the mailbox analysis results through the preset mailbox whitelist, conduct mailbox entity reputation profiling, and provide reputation query services. Complete the data interface field encapsulation and application program interface (API) interface provision, and provide API or application calls to the outside world to realize the value of threat analysis.

应用层具体可以包括：白名单前置模块、邮箱信誉画像模块、邮箱信誉查询模块和邮箱信誉服务模块。The application layer may specifically include: a whitelist pre-module, an email reputation profiling module, an email reputation query module, and an email reputation service module.

(1)白名单前置模块：用于存储预置的白名单，并根据预置的白名单对邮箱研判结果进行校验。(1) Whitelist pre-module: used to store the preset whitelist and verify the mailbox analysis results based on the preset whitelist.

(2)邮箱信誉画像模块：用于根据邮箱实体的上下文信息以及信誉研判结果对邮箱进行画像。(2) Email reputation profiling module: used to profile the email based on the email entity’s contextual information and reputation assessment results.

(3)邮箱信誉查询模块：用于根据用户输入的邮箱实体查询并输出该邮箱实体所有的相关信息。(3) Mailbox reputation query module: used to query and output all relevant information of the mailbox entity based on the mailbox entity input by the user.

(4)邮箱信誉服务模块：用于提供邮箱信誉查询的其它服务。(4) Mailbox reputation service module: used to provide other services for mailbox reputation query.

以上为邮箱信誉情报库中各邮箱信誉情报的生产架构示意，接下来详细对邮箱信誉情报库的创建过程进行说明。图3为本申请实施例中邮箱信誉情报库的生成方法的流程示意图，参见图3所示，该方法可以包括：The above is a schematic diagram of the production architecture of each mailbox reputation intelligence in the mailbox reputation intelligence library. Next, the creation process of the mailbox reputation intelligence library is described in detail. Figure 3 is a flow chart of the method for generating the mailbox reputation intelligence library in an embodiment of the present application. Referring to Figure 3, the method may include:

S301：通过多种信息源采集多个邮箱实体。S301: Collect multiple mailbox entities through multiple information sources.

为了构建邮箱信誉情报库，使得邮箱信誉情报库中的邮箱实体更为丰富，使得邮箱实体对应的上下文信息更为丰富，就需要从多方采集与邮箱实体相关的一切信息。In order to build a mailbox reputation intelligence database, make the mailbox entities in the mailbox reputation intelligence database richer, and make the context information corresponding to the mailbox entities richer, it is necessary to collect all information related to the mailbox entities from multiple parties.

具体来说，上述步骤S301可以包括：Specifically, the above step S301 may include:

步骤A：从开源数据、域名查询协议Whois记录、失陷攻击指标IOC信息、威胁情报TI数据、大网爬虫数据和黑/白名单中采集多个邮箱实体。Step A: Collect multiple mailbox entities from open source data, domain name query protocol Whois records, compromised attack indicators (IOC) information, threat intelligence (TI) data, large-scale web crawler data, and black/white lists.

开源数据，一般可以是指在互联网上公开数据集，任何人通过其使用的终端都能够直接获取到的数据。Open source data generally refers to data sets that are publicly available on the Internet and can be directly accessed by anyone through the terminal they use.

域名查询协议Whois记录，Whois就是一个用来查询域名是否已经被注册，以及注册域名的详细信息的数据库。通过Whois记录也能够获取到邮箱实体的相关数据。Domain name query protocol Whois record, Whois is a database used to query whether a domain name has been registered and detailed information about the registered domain name. Through Whois records, the relevant data of the mailbox entity can also be obtained.

失陷攻击指标IOC信息，IOC是发现威胁最有效的判断方法之一。通过IOC能够实现威胁发现的各类实体包括但不限于：网际互连协议(Internet Protoco，IP)地址、域名、文件哈希(Hash)、邮件地址等。IOC能够提供威胁研判的标示以发现各类攻击活动迹象。因此，通过IOC信息也能够获得邮箱实体的各种信息。The IOC information of the attack indicators is lost. IOC is one of the most effective methods to detect threats. Various entities that can be used to detect threats through IOC include but are not limited to: Internet Protocol (IP) addresses, domain names, file hashes, email addresses, etc. IOC can provide threat analysis indicators to discover various signs of attack activities. Therefore, various information about mailbox entities can also be obtained through IOC information.

威胁情报TI数据，在TI数据中，会存在有各种邮箱信誉情报，因此，通过TI数据也能够获取邮箱实体的各种信息。Threat intelligence TI data includes various mailbox reputation intelligence. Therefore, various information about mailbox entities can also be obtained through TI data.

大网爬虫数据，通过爬虫能够获取到互联网中各种各样的数据，因此，通过爬虫也能够爬取到与邮箱实体相关的各种信息。The big web crawler data can obtain various data on the Internet through crawlers. Therefore, various information related to the mailbox entity can also be crawled through crawlers.

黑/白名单，即邮箱的黑/白名单，也属于邮箱实体的一种信息。Blacklist/whitelist, that is, the blacklist/whitelist of the mailbox, is also a kind of information of the mailbox entity.

当然，还可以通过上述信息源外的其它信息源获取邮箱实体，例如：个人提供的邮箱实体等。对于采集多个邮箱实体的具体信息源，此处不做限定。Of course, the mailbox entity can also be obtained through other information sources other than the above information sources, such as: a mailbox entity provided by an individual, etc. The specific information source for collecting multiple mailbox entities is not limited here.

由上述内容可知，通过多种信息源进行邮箱实体的采集，能够使得邮箱实体获取的最大化，进而最终使得邮箱信誉情报库中的邮箱情报更加丰富，也提供更加丰富的上下文信息。From the above content, it can be seen that collecting mailbox entities through multiple information sources can maximize the acquisition of mailbox entities, thereby ultimately enriching the mailbox intelligence in the mailbox reputation intelligence library and providing richer context information.

S302：分别从多个邮箱实体中提取上下文信息。S302: extracting context information from multiple mailbox entities respectively.

其中，上下文信息与邮箱实体信誉研判相关。Among them, the context information is related to the reputation assessment of the mailbox entity.

在采集到多个邮箱实体后，就需要将与邮箱实体相关的、能够在邮箱信誉评价中使用到的信息进行抽取，即抽取上下文信息，以使得邮箱信誉情报库中邮箱实体的相关信息的丰富程度最大化。After collecting multiple mailbox entities, it is necessary to extract information related to the mailbox entity that can be used in the mailbox reputation evaluation, that is, to extract context information, so as to maximize the richness of the relevant information of the mailbox entity in the mailbox reputation intelligence library.

具体来说，上述步骤S302可以包括：Specifically, the above step S302 may include:

步骤B：提取多个邮箱实体中的发件人、邮件标签、IP地址、邮件主题列表、关联样本、邮箱实体Whois注册信息反查结果、恶意标签、域名以及活跃时间属性中的至少一项，作为多个邮箱实体分别对应的上下文信息。Step B: extract at least one of the sender, email label, IP address, email subject list, associated samples, email entity Whois registration information reverse query results, malicious label, domain name and active time attributes from multiple mailbox entities as the context information corresponding to the multiple mailbox entities respectively.

也就是说，针对每一个邮箱实体，都需要抽取其发件人、邮件标签、IP地址、邮件主题列表、关联样本、邮箱实体Whois注册信息反查结果、恶意标签、域名以及活跃时间属性中的至少一项，并作为相应邮箱实体的上下文信息。That is to say, for each mailbox entity, it is necessary to extract at least one of its sender, email label, IP address, email subject list, associated samples, mailbox entity Whois registration information reverse query results, malicious labels, domain name and active time attributes as the context information of the corresponding mailbox entity.

发件人，可以是指发送邮件的主体，例如：张三。The sender can refer to the subject who sends the email, for example: Zhang San.

邮件标签，具体可以包括：邮箱后缀(即免费的公共邮件服务，例如：163.com)、收件人(具体可以是收件人单位列表、行业列表等)等。Mail labels may specifically include: mailbox suffix (i.e. free public mail service, such as: 163.com), recipients (specifically, it may be a list of recipient units, a list of industries, etc.), etc.

IP地址，具体可以包括：邮件服务器IP地址和发件人真实IP地址等。IP address, which may include: mail server IP address and sender's real IP address, etc.

邮件主题列表，也就是邮件主题名称的集合。Email subject list, that is, a collection of email subject names.

关联样本，也就是与该邮件的某一内容一致的内容样本。具体可以包括：作为发件人邮箱的样本、作为收件人邮箱的样本、发件人样本中的附件和作为窃密木马内嵌邮箱的样本。Related samples are samples with content that is consistent with a certain content of the email. Specifically, they may include: samples of the sender’s mailbox, samples of the recipient’s mailbox, attachments in the sender’s sample, and samples of the mailbox where the stealing Trojan is embedded.

邮箱实体Whois注册信息反查结果，也就是在Whois中查询到的该邮件的相关信息。The result of reverse query of the mailbox entity Whois registration information, that is, the relevant information of the email found in Whois.

恶意标签，也就是该邮件所携带的能够表明该邮件性质的标签。具体可以包括：SPAM垃圾邮件、phish钓鱼、fraud诈骗、advertising广告、APT和hacked。Malicious tags are tags that indicate the nature of the email, including SPAM, phish, fraud, advertising, APT, and hacked.

域名，具体可以包括：域名的主要mx服务器、单位、行业。Domain name, specifically can include: the domain name's main mx server, unit, industry.

活跃时间属性，具体可以包括：首次活跃时间、最近活跃时间。Active time attributes may specifically include: first active time, most recent active time.

为了能够更加清楚地看到都提取了哪些上下文信息，在此以一幅图来进行形象的说明。图4为本申请实施例中提取的邮箱实体的上下文信息示意图，参见图4所示，提取的邮箱实体的上下文信息即邮件情报富化信息，大的方面主要包括：发件人、邮件标签、IP地址、邮件主题列表、关联样本、Whois注册信息反查结果、恶意标签、域名和活跃时间属性。某一些大类下还分别列举了一些小类，都在图4中明确的给出，此处不再赘述。In order to more clearly see what context information has been extracted, a picture is used here for a vivid explanation. Figure 4 is a schematic diagram of the context information of the mailbox entity extracted in the embodiment of the present application. As shown in Figure 4, the extracted context information of the mailbox entity is the email intelligence enrichment information, and the major aspects mainly include: sender, email label, IP address, email subject list, associated samples, Whois registration information reverse query results, malicious labels, domain names and active time attributes. Some subcategories are also listed under some major categories, which are clearly given in Figure 4 and will not be repeated here.

由上述内容可知，通过从邮箱实体中提取出与邮箱信誉研判相关的上下文信息，使得后续能够基于该上下文信息对邮箱实体进行信誉评价，并保存在邮箱信誉情报库中，使得邮箱信誉情报库中的情报更加丰富，为后续待识别的邮箱实体提供更加丰富的情报数据，提高分析师对邮箱实体追踪溯源的效率和准确性。From the above content, it can be seen that by extracting contextual information related to mailbox reputation assessment from the mailbox entity, the mailbox entity can be subsequently evaluated for reputation based on the contextual information and saved in the mailbox reputation intelligence library, making the intelligence in the mailbox reputation intelligence library richer, providing richer intelligence data for subsequent mailbox entities to be identified, and improving the efficiency and accuracy of analysts in tracking and tracing mailbox entities.

而从不同信息源获取的邮箱实体的形式各异，为了能够从这些邮箱实体中提取到相应的发件人、标签、IP地址、邮件主题列表、关联样本、Whois注册信息反查结果、恶意标签、域名和活跃时间属性，可以采用本申请实施例提供的一种信息抽取方式。The forms of mailbox entities obtained from different information sources are different. In order to extract the corresponding senders, labels, IP addresses, email subject lists, associated samples, Whois registration information reverse query results, malicious labels, domain names and active time attributes from these mailbox entities, an information extraction method provided in an embodiment of the present application can be adopted.

具体来说，上述步骤B可以包括：Specifically, the above step B may include:

步骤B1：分别提取多个邮箱实体的鉴定报告的原始数据中的mail_info、target、file、type字段，得到多个邮箱实体分别对应的第一子信息。Step B1: extract mail_info, target, file, and type fields from the original data of the identification reports of multiple mailbox entities respectively, and obtain the first sub-information corresponding to the multiple mailbox entities respectively.

也就是说，提取所有文件鉴定过程中邮件相关的报告report里的原始数据字段mail_info.sender，字段信息为list，筛选存在'mail_info'字段的项，同时提取target、file、type等字段，针对email和msg等部分进行分别处理。In other words, extract the original data field mail_info.sender in the report related to all emails in the file identification process, the field information is list, filter the items with the 'mail_info' field, and extract the target, file, type and other fields at the same time, and process the email and msg parts separately.

步骤B2：分别提取多个邮箱实体的信息摘要算法(Message Digest Algorithm，MD5)信息，得到多个邮箱实体分别对应的第二子信息。Step B2: extracting Message Digest Algorithm (MD5) information of multiple mailbox entities respectively, and obtaining second sub-information corresponding to the multiple mailbox entities respectively.

通过提取邮箱实体的MD5信息，能够基于邮件的内容本身进行快速研判，使得邮箱信誉情报库中的情报更加准确。By extracting the MD5 information of the mailbox entity, it is possible to quickly analyze and judge based on the content of the email itself, making the information in the mailbox reputation intelligence database more accurate.

步骤B3：分别提取多个邮箱实体的沙箱数据报文中邮件的相关信息，得到多个邮箱实体分别对应的第三子信息。Step B3: extract the relevant information of the emails in the sandbox data messages of the multiple mailbox entities respectively, and obtain the third sub-information corresponding to the multiple mailbox entities respectively.

沙箱，即report packet，是一个虚拟系统程序，允许邮件email在沙盘环境中运行浏览器或其他程序。这样，就能够得到邮件的实际运行结果内容，从而对邮件进行研判，使得邮箱信誉情报库中的情报更加准确。Sandbox, or report packet, is a virtual system program that allows email to run browsers or other programs in a sandbox environment. In this way, the actual running results of the email can be obtained, so that the email can be analyzed and judged, making the information in the mailbox reputation intelligence database more accurate.

步骤B4：分别提取多个邮箱实体的邮件交换(Mail Exchanger，MX)记录、发送方策略框架(Sender Policy Framework，SPF)协议信息、DMARC(Domain-based MessageAuthentication,Reporting,and Conformance)信息，得到多个邮箱实体分别对应的第四子信息。Step B4: Extract the Mail Exchanger (MX) records, Sender Policy Framework (SPF) protocol information, and DMARC (Domain-based Message Authentication, Reporting, and Conformance) information of multiple mailbox entities respectively, and obtain the fourth sub-information corresponding to the multiple mailbox entities respectively.

MX，指向一个邮件服务器，用于电子邮件系统发邮件时根据收信人的地址后缀来定位邮件服务器。SPF，能够为邮件的接收方提供一种检验。DMARC，代表基于域的消息认证，是一个DNS TXT记录，可以发布给域，以控制消息认证失败时会发生什么。通过获取邮箱实体的MX记录、SPF协议信息、DMARC信息，能够获得邮箱实体更为丰富的上下文信息，进而富化邮箱信誉情报库。MX, which points to a mail server, is used by the email system to locate the mail server based on the recipient's address suffix when sending mail. SPF, which can provide a kind of verification for the recipient of the mail. DMARC, which stands for Domain-based Message Authentication, is a DNS TXT record that can be published to a domain to control what happens when the message authentication fails. By obtaining the MX record, SPF protocol information, and DMARC information of the mailbox entity, richer contextual information of the mailbox entity can be obtained, thereby enriching the mailbox reputation intelligence library.

步骤B5：通过失陷检测模块对第一子信息、第二子信息、第三子信息和第四子信息进行检测，得到多个邮箱实体分别对应的检测结果。Step B5: The first sub-information, the second sub-information, the third sub-information and the fourth sub-information are detected by the loss detection module to obtain detection results corresponding to the multiple mailbox entities respectively.

也就是说，将邮箱字段信息对接失陷检测类接口，根据得到的IOC、恶意家族(malicious_family)、活动(campaign)等信息进行信息关联和转换，映射为邮件(email)、画像(portraits)、标签(tag)、恶意活动(malicious_activity)等字段信息。That is to say, the mailbox field information is connected to the compromise detection interface, and the information is associated and converted according to the obtained IOC, malicious family (malicious_family), activity (campaign) and other information, and mapped to field information such as email, portraits (portraits), tags (tag), malicious activity (malicious_activity), etc.

这里的失陷检测模块，就是能够对实体信息进行恶意检测的工具，例如：现有的各种失陷类检测工具。将实体信息与失陷类检测接口对接，就能够实现实体信息的恶意检测。The compromise detection module here is a tool that can detect malicious intent in entity information, such as various existing compromise detection tools. By connecting entity information with the compromise detection interface, malicious intent detection can be achieved.

步骤B6：基于第一子信息、第二子信息、第三子信息、第四子信息和检测结果，确定多个邮箱实体分别对应的发件人、标签、IP地址、邮件主题列表、关联样本、邮箱实体Whois注册信息反查结果、恶意标签、域名和活跃时间属性，并作为多个邮箱实体分别对应的上下文信息。Step B6: Based on the first sub-information, the second sub-information, the third sub-information, the fourth sub-information and the detection results, determine the senders, labels, IP addresses, email subject lists, associated samples, reverse query results of the mailbox entity Whois registration information, malicious labels, domain names and active time attributes corresponding to the multiple mailbox entities respectively, and use them as the context information corresponding to the multiple mailbox entities respectively.

在将上述各子信息抽取完毕后，按照相应的类别，分别归入到发件人、标签、IP地址、邮件主题列表、关联样本、邮箱实体Whois注册信息反查结果、恶意标签、域名和活跃时间属性这些类别中，最后就得到了邮箱实体的上下文信息了。After extracting the above sub-information, they are classified into sender, label, IP address, email subject list, associated samples, mailbox entity Whois registration information reverse query results, malicious labels, domain names and active time attributes according to the corresponding categories. Finally, the context information of the mailbox entity is obtained.

由上述内容可知，通过上述信息抽取方式能够最大程度的获取到邮箱实体涉及的各种上下文信息，进而使得邮箱信誉情报库中的情报内容更加富化，提升分析师追踪溯源的效率和准确性。From the above content, it can be seen that the above information extraction method can obtain various contextual information related to the mailbox entity to the greatest extent, thereby enriching the intelligence content in the mailbox reputation intelligence library and improving the efficiency and accuracy of analysts' tracking and tracing.

而当后续又有新的抽取规则产生时，可以在上述信息抽取方式的基础上再添加新的抽取方式进行后续邮箱实体的上下文信息的抽取，并将先前的采集的邮箱实体按照新添加的抽取方式再进行上下文信息的抽取。这样，能够使得邮箱信誉情报库中的内容持续更新并进一步富化。When new extraction rules are generated later, a new extraction method can be added on the basis of the above information extraction method to extract the context information of the subsequent mailbox entity, and the previously collected mailbox entity can be subjected to the context information extraction according to the newly added extraction method. In this way, the content in the mailbox reputation intelligence database can be continuously updated and further enriched.

具体来说，在上一次完成邮箱实体的上下文信息入库后，该方法还可以包括：Specifically, after the last completion of storing the context information of the mailbox entity, the method may further include:

步骤C1：接收新增抽取规则。Step C1: Receive newly added extraction rules.

在这里，新增抽取规则一般是指前期没有规则总结到的，而后期想到的一些邮箱实体上下文信息的抽取规则。Here, the newly added extraction rules generally refer to the extraction rules for some mailbox entity context information that were not summarized in the early stage but were thought of later.

步骤C2：按照新增抽取规则分别从多个邮箱实体中提取上下文信息。Step C2: extract context information from multiple mailbox entities respectively according to the newly added extraction rules.

也就是说，在邮箱信誉情报库已有的邮箱实体中，除了按照上述步骤B1-B6的方式提取上下文信息之外，还需要再次按照新增抽取规则提取上下文信息。接着，进行整合、研判完毕后，再存入邮箱信誉情报库中相应的邮箱实体处。That is to say, in the mailbox entity already in the mailbox reputation intelligence database, in addition to extracting context information according to the above steps B1-B6, it is necessary to extract context information again according to the newly added extraction rules. Then, after integration and judgment, it is stored in the corresponding mailbox entity in the mailbox reputation intelligence database.

由上述内容可知，通过新增抽取规则，并采用新增抽取规则再次抽取邮箱信誉情报库中已存在的邮箱实体的上下文信息，使得邮箱信誉情报库中邮箱实体的上下文信息能够持续富化。From the above content, it can be seen that by adding new extraction rules and using the new extraction rules to re-extract the context information of the mailbox entities already existing in the mailbox reputation intelligence library, the context information of the mailbox entities in the mailbox reputation intelligence library can be continuously enriched.

下面给出一种具体的新增抽取规则，上述步骤C2具体可以包括：A specific new extraction rule is given below. The above step C2 may specifically include:

步骤C21：分别提取多个邮箱实体中的域名。Step C21: extracting domain names from multiple mailbox entities respectively.

域名，即domain。通过email字段即可提取到。可以通过正则提取email字段中的"domain"域名部分。当然，还可以采用其它方式进行"domain"域名部分的提取，例如：扫描提取等。对于进行"domain"域名部分提取的具体方式，此处不做限定。Domain name, that is, domain. It can be extracted through the email field. The "domain" domain part in the email field can be extracted through regular expression. Of course, other methods can also be used to extract the "domain" domain part, such as scanning extraction. The specific method for extracting the "domain" domain part is not limited here.

步骤C22：判断域名是否与已知恶意域名存在关联关系。若是，则执行步骤C23，若否，则执行步骤C24。Step C22: Determine whether the domain name is associated with a known malicious domain name. If yes, execute step C23; if no, execute step C24.

步骤C23：将相同的域名和已知恶意域名作为相应邮箱实体的上下文信息；Step C23: Using the same domain name and the known malicious domain name as context information of the corresponding mailbox entity;

步骤C24：将提取的域名作为相应邮箱实体的上下文信息。Step C24: Use the extracted domain name as context information of the corresponding mailbox entity.

如果从email字段中提取到的"domain"域名部分与已知的恶意域名存在关联关系，说明该域名对应的邮箱实体存在问题，那么，在提取的上下文信息中，就需要加入该已知的恶意域名，以便后续输出该邮箱实体的上下文信息时，还提供有与之相关的恶意域名，提醒分析师注意，进而提高分析师追踪溯源的效率和准确性。If the "domain" domain name extracted from the email field is associated with a known malicious domain name, it means that there is a problem with the mailbox entity corresponding to the domain name. In this case, the known malicious domain name needs to be added to the extracted context information so that when the context information of the mailbox entity is subsequently output, the related malicious domain name can also be provided to alert the analyst, thereby improving the efficiency and accuracy of the analyst's tracking and tracing.

而如果从email字段中提取到的"domain"域名部分与已知的恶意域名并不存在关联关系，说明该域名对应的邮箱实体目前没有发现异常，那么，在提取的上下文信息中，就只有从该邮箱实体中提取到的上下文信息，进而提供给分析师进行追踪溯源。If the "domain" domain name extracted from the email field has no correlation with the known malicious domain name, it means that there is no abnormality in the email entity corresponding to the domain name. In this case, the extracted context information will only contain the context information extracted from the email entity, which will be provided to the analyst for tracking and tracing.

由上述内容可知，通过在email字段中提取"domain"域名部分，并与已知的恶意域名进行关联分析，进而将存在关联关系的恶意域名加入到相应邮箱实体的上下文信息中，以便后续输出该邮箱实体的上下文信息时，还提供有与之相关的恶意域名，提醒分析师注意，进而提高分析师追踪溯源的效率和准确性。From the above content, we can see that by extracting the "domain" domain name part in the email field and performing correlation analysis with known malicious domain names, the associated malicious domain names are added to the context information of the corresponding mailbox entity, so that when the context information of the mailbox entity is subsequently output, the related malicious domain names are also provided to remind analysts, thereby improving the efficiency and accuracy of analysts' tracking and tracing.

S303：将同一邮箱实体的上下文信息进行整合。S303: Integrate the context information of the same mailbox entity.

在抽取完各邮箱实体的上下文信息后，由于从不同信息源采集的邮箱可能会存在是同一个邮箱的问题，为了避免邮箱信誉情报库中同一个邮箱实体对应有重复的上下文信息，因此需要将同一邮箱实体的上下文信息进行整合，即合并去重。After extracting the context information of each mailbox entity, since the mailboxes collected from different information sources may be the same mailbox, in order to avoid duplicate context information corresponding to the same mailbox entity in the mailbox reputation intelligence library, the context information of the same mailbox entity needs to be integrated, that is, merged and deduplicated.

举例来说，从A信息源采集的邮箱实体1的上下文信息为a、b、c，从B信息源采集的邮箱实体的上下文信息为b、c、d。其中，从两个信息源采集的邮箱实体的上下文信息中b和c是重复的，因此需要进行整合，整合得到邮箱实体1的上下文信息a、b、c、d。For example, the context information of mailbox entity 1 collected from information source A is a, b, c, and the context information of mailbox entity 1 collected from information source B is b, c, d. Among them, b and c are repeated in the context information of mailbox entities collected from the two information sources, so they need to be integrated to obtain the context information a, b, c, d of mailbox entity 1.

S304：对整合后的上下文信息进行信誉研判，生成各个邮箱实体的研判结果。S304: Conduct reputation assessment on the integrated context information to generate assessment results for each mailbox entity.

在得到各个邮箱实体不重复的上下文信息后，需要进一步对这些上下文信息进行信誉研判，得出各邮箱实体的研判结果，以便后续向分析师提供邮箱实体的上下文信息的同时，还能够提供该邮箱实体的综合研判结果，以便分析师对于待识别的邮箱实体有一个整体认知。After obtaining the non-repetitive context information of each mailbox entity, it is necessary to further conduct credibility analysis on this context information to obtain the analysis results of each mailbox entity, so that the analyst can be provided with the context information of the mailbox entity and the comprehensive analysis results of the mailbox entity in the future, so that the analyst can have a holistic understanding of the mailbox entity to be identified.

具体来说，上述步骤S304可以包括：Specifically, the above step S304 may include:

步骤D1：判断整合后的上下信息中是否存在白名单、是否存在标签信息、是否存在失陷检测判定级别分数、是否存在高级持续性威胁APT家族、是否存在异常字段以及是否存在历史记录分值。若至少有一个存在，则执行步骤D2，若均不存在，则执行步骤D3。Step D1: Determine whether there is a whitelist, tag information, compromise detection level score, advanced persistent threat APT family, abnormal field, and historical record score in the integrated upper and lower information. If at least one exists, execute step D2; if none of them exists, execute step D3.

步骤D2：将判断出存在的内容对应的分值添加至相应邮箱实体对应的信誉分值中。Step D2: Add the score corresponding to the content determined to exist to the reputation score corresponding to the corresponding mailbox entity.

步骤D3：确定相应邮箱实体的信誉分值为0。Step D3: Determine that the reputation score of the corresponding mailbox entity is 0.

也就是说，将某一个邮箱实体整合后的上下文信息依次进行判断，即，先判断上下文信息是否存在于白名单中，再判断上下文信息中是否有标签，再判断上下文信息中是否有IOC得分，再判断上下文信息中是否存在APT标识，再判断上下文信息中是否还存在其它非正常字段(例如：IP地址行为异常、活动范围异常等)，最后再判断上下文信息中是否有之前已经进行过研判的研判结果。That is to say, the context information after integration of a certain mailbox entity is judged in sequence, that is, first determine whether the context information exists in the whitelist, then determine whether there is a label in the context information, then determine whether there is an IOC score in the context information, then determine whether there is an APT logo in the context information, then determine whether there are other abnormal fields in the context information (for example: abnormal IP address behavior, abnormal activity range, etc.), and finally determine whether there are any previously analyzed results in the context information.

在上述判断的过程中，如果出现一个判断结果为是，则在基础分值的基础上增加一个该“是”的项目对应的分值。如果从头至尾都没有出现过“是”的判断结果，那么，最终的分值就是基础分值。基础分值可以是0。In the above judgment process, if a judgment result is yes, then the score corresponding to the "yes" item will be increased on the basis of the basic score. If there is no "yes" judgment result from beginning to end, then the final score is the basic score. The basic score can be 0.

举例来说，对于不同的邮箱实体，都会关联威胁情报接口进行查询，提取其中的类型。例如：Phish钓鱼类、fraud欺诈类、APT类等。根据不同的标记结果进入分数计算逻辑。For example, for different mailbox entities, threat intelligence interfaces will be associated for query and the types will be extracted, such as phishing, fraud, APT, etc. The score calculation logic is entered according to different marking results.

代码如下：The code is as follows:

在实际应用中，可以根据不同邮箱实体得到的信誉分值，将不同的邮箱实体归入到不同的信誉等级中。例如：In actual applications, different mailbox entities can be classified into different reputation levels according to the reputation scores obtained by different mailbox entities. For example:

信誉分值为0–30，表示安全。判定此邮箱信誉值低，提供可能的威胁信息，及邮箱画像信息，判定为安全邮箱。The reputation score is 0-30, indicating safety. The reputation of this mailbox is judged to be low, and possible threat information and mailbox portrait information are provided, and it is judged to be a safe mailbox.

信誉分值为30–40，表示未知。需要更多信息定位研判，需依据邮箱的画像信息和详情字段判定是否为恶意。A reputation score of 30-40 indicates unknown. More information is needed to determine whether the email is malicious based on the email profile information and detail fields.

信誉分值为40-80，表示可疑。有可能是恶意威胁，根据上下文研判可能为恶意邮箱。A reputation score of 40-80 indicates suspiciousness. It may be a malicious threat and may be a malicious mailbox based on the context.

信誉分值为80-100，表示恶意。信誉度高，提供了研判依据以及上下文信息，多为恶意邮箱地址。A reputation score of 80-100 indicates malicious intent. A high reputation provides a basis for judgment and contextual information, and is often a malicious email address.

由上述内容可知，在标准信誉评分的基础上，采用动态评价方式，能够持续更新邮箱实体的信誉研判结果，提升邮箱信誉评价的准确性，使得邮箱信誉情报库中的情报更加准确，提升分析师基于邮箱信誉情报库提供的邮箱实体的上下文信息进行有效精准的溯源定位。From the above content, it can be seen that on the basis of standard reputation scoring, the use of dynamic evaluation methods can continuously update the reputation analysis results of email entities, improve the accuracy of email reputation evaluation, make the intelligence in the email reputation intelligence library more accurate, and improve analysts' effective and accurate tracing and positioning based on the contextual information of email entities provided by the email reputation intelligence library.

S305：将各个邮箱实体的研判结果加入到相应的整合后的上下文信息中并存储于邮箱信誉情报库。S305: Add the analysis results of each mailbox entity to the corresponding integrated context information and store it in the mailbox reputation intelligence library.

在得到各邮箱实体的研判结果，即信誉评价后，就可以将研判结果与相应邮箱的上下文信息一并存储到邮箱信誉情报库中，以便后续分析师需要获得该邮箱实体的相关信息时，可以直接将该邮箱实体的研判结果和上下文信息一并提供给分析师。After obtaining the analysis results, i.e., reputation evaluation, of each mailbox entity, the analysis results and the context information of the corresponding mailbox can be stored in the mailbox reputation intelligence library together, so that when subsequent analysts need to obtain relevant information of the mailbox entity, the analysis results and context information of the mailbox entity can be directly provided to the analysts.

具体来说，上述步骤S305可以包括：Specifically, the above step S305 may include:

步骤E：将各个邮箱实体的研判结果加入到相应的整合后的上下文信息中并按照信誉等级、基础信息、恶意活动标识和邮箱行为画像的分类方式存储于邮箱信誉情报库。Step E: Add the analysis results of each mailbox entity to the corresponding integrated context information and store it in the mailbox reputation intelligence library according to the classification method of reputation level, basic information, malicious activity identification and mailbox behavior portrait.

也就是说，针对每一个邮箱实体，都需要按照信誉等级、基础信息、恶意活动标识和邮箱行为画像的分类方式存储于邮箱信誉情报库。将来分析师需要获取邮箱实体的相关信息时，输出的也是按照信誉等级、基础信息、恶意活动标识和邮箱行为画像这样的分类方式输出的信息。That is to say, for each mailbox entity, it needs to be stored in the mailbox reputation intelligence database according to the classification method of reputation level, basic information, malicious activity identification and mailbox behavior portrait. In the future, when analysts need to obtain relevant information about mailbox entities, the output will also be information classified according to the reputation level, basic information, malicious activity identification and mailbox behavior portrait.

信誉等级，也就是信誉评价，可以分为：恶意、可疑、未知、安全。The reputation level, or reputation evaluation, can be divided into: malicious, suspicious, unknown, and safe.

基础信息，可以是指邮箱中一些比较容易获得的信息，可以包括：邮箱类型(是否为企业邮箱，可以根据icp备案信息获得，例如：info@reddit.com，info@qianxin.com。是否为公共/免费邮箱，例如：person@yahoo.com)、MX记录(邮箱的域是否有有效的MX记录)、DMARC记录(需和研究院确认数据源)、最早看到时间(最远一次注册行动/收发邮件时间)、最近活跃时间(最近一次注册行动/收发邮件时间)。Basic information refers to some information in the mailbox that is relatively easy to obtain, including: mailbox type (whether it is a corporate mailbox, which can be obtained based on the ICP filing information, such as: info@reddit.com, info@qianxin.com. Whether it is a public/free mailbox, such as: person@yahoo.com), MX record (whether the mailbox domain has a valid MX record), DMARC record (the data source needs to be confirmed with the research institute), earliest seen time (the farthest registration action/sending and receiving email time), and most recent active time (the most recent registration action/sending and receiving email time).

恶意活动标识，可以包括：垃圾邮件、伪造邮件、Whois信息关联域名的恶意标签(例如：APT、CS)、自定义运营标签。Malicious activity identification may include: spam, forged emails, malicious labels of domain names associated with Whois information (for example: APT, CS), and custom operation labels.

邮箱行为画像，可以包括：应用app、是否泄露(是or否)、其它(例如：是否在公开网站出现过)。其中，应用app可以包括：探测是否在应用app有注册/访问记录，例如：京东、淘宝常用购物网站等，社交账号(探测是否在某社交账号注册记录，例如：推特、微博等主流社交网站)，主站注册信息(是否在主流站点注册账号，例如：gmail、google)。Email behavior profiles can include: application app, whether it has been leaked (yes or no), and others (for example, whether it has appeared on a public website). Among them, application app can include: detecting whether there is a registration/access record in the application app, such as: JD.com, Taobao, etc., social account (detecting whether there is a registration record in a certain social account, such as: Twitter, Weibo and other mainstream social websites), main site registration information (whether the account is registered on a mainstream site, such as: gmail, google).

为了能够更加清楚地看到都输出了哪些具体信息，在此以一幅图来进行形象的说明。图5为本申请实施例中输出的邮箱实体的上下文信息示意图，参见图5所示，输出的邮箱实体的上下文信息即邮件情报富化信息，大的方面主要包括：信誉等级、基础信息、恶意活动标识和邮箱行为画像。某一些大类下还分别列举了一些小类，都在图5中明确的给出，此处不再赘述。In order to more clearly see what specific information is output, a picture is used here for a vivid explanation. Figure 5 is a schematic diagram of the context information of the mailbox entity output in the embodiment of the present application. As shown in Figure 5, the context information of the mailbox entity output is the email intelligence enrichment information, which mainly includes: reputation rating, basic information, malicious activity identification and mailbox behavior portrait. Some subcategories are also listed under some major categories, which are clearly given in Figure 5 and will not be repeated here.

字段提取和生产处理后的邮箱信誉情报的富化内容的代码如下：The code for field extraction and production of enriched content of processed mailbox reputation intelligence is as follows:

由上述内容可知，通过将邮箱实体的相关信息按照信誉等级、基础信息、恶意活动标识和邮箱行为画像这样的分类方式后续进行输出，能够使得富化后的上下文信息更加清楚明了，便于分析师查看。通过采取以上方式建立的邮箱信誉情报库，能够为分析师提供更为丰富的邮箱实体的上下文信息，使得分析师能够基于富化的上下文信息高效精准地进行溯源定位。From the above content, it can be seen that by outputting the relevant information of the mailbox entity in the form of reputation level, basic information, malicious activity identification and mailbox behavior portrait, the enriched context information can be made clearer and easier for analysts to view. The mailbox reputation intelligence library established by the above method can provide analysts with richer context information of the mailbox entity, allowing analysts to efficiently and accurately trace the source based on the enriched context information.

最后，再以两个具体实例对本申请实施例中邮箱信誉情报库的生产以及邮箱信誉情报库的部署过程再次进行说明。Finally, two specific examples are used to further illustrate the production of the mailbox reputation intelligence database and the deployment process of the mailbox reputation intelligence database in the embodiment of the present application.

图6为本申请实施例中邮箱信誉情报库的生产流程示意图，参见图6所示，通过开源数据、Whois记录、IOC信息、TI数据、爬虫数据、黑/白名单、其他等信息源采集邮箱实体，并对采集到的邮箱实时进行实时处理，即实时进行解析/抽取/转换/格式统一，再通过kafka进行处理，进而再入库处理，最终进入到db中存储。Figure 6 is a schematic diagram of the production process of the mailbox reputation intelligence library in the embodiment of the present application. As shown in Figure 6, mailbox entities are collected through open source data, Whois records, IOC information, TI data, crawler data, black/white lists, and other information sources, and the collected mailboxes are processed in real time, that is, they are parsed/extracted/converted/formatted in real time, and then processed through kafka, and then put into storage for processing, and finally stored in the db.

图7为本申请实施例中邮箱信誉情报库的部署流程示意图，参见图7所示，对外向分析师提供统一接口，即api.xxx.com。接收的数据实体通过lvs负载均衡，分别分配到kong1、kong2或kong3等不同的网关。进而分配给不同的邮箱信誉服务，即email-reputation-service从邮箱信誉情报库中调用并输出相应的上下文信息。FIG7 is a schematic diagram of the deployment process of the mailbox reputation intelligence library in the embodiment of the present application. Referring to FIG7, a unified interface is provided to external analysts, namely, api.xxx.com. The received data entities are distributed to different gateways such as kong1, kong2 or kong3 through lvs load balancing. Then, they are distributed to different mailbox reputation services, namely, email-reputation-service calls and outputs corresponding context information from the mailbox reputation intelligence library.

基于同一发明构思，作为对上述获取方法的实现，本申请实施例还提供了一种邮箱上下文信息的获取装置。图8为本申请实施例中邮箱上下文信息的获取装置的结构示意图，参见图8所示，该装置可以包括：Based on the same inventive concept, as an implementation of the above acquisition method, the present application embodiment also provides a device for acquiring mailbox context information. FIG8 is a schematic diagram of the structure of the device for acquiring mailbox context information in the present application embodiment. Referring to FIG8 , the device may include:

接收模块801，用于获取待识别的邮箱实体；The receiving module 801 is used to obtain the mailbox entity to be identified;

匹配模块802，用于将所述待识别的邮箱实体与邮箱信誉情报库中的邮箱实体进行匹配，所述邮箱信誉情报库中包含有各种邮箱实体及其上下文信息，所述邮箱信誉情报库中一种邮箱实体的上下文信息的内容多于单独从所述一种邮箱实体中提取的上下文信息的内容；A matching module 802 is used to match the mailbox entity to be identified with a mailbox entity in a mailbox reputation intelligence database, wherein the mailbox reputation intelligence database contains various mailbox entities and their context information, and the content of the context information of one mailbox entity in the mailbox reputation intelligence database is greater than the content of the context information extracted from the one mailbox entity alone;

若匹配成功，则进入第一输出模块803，所述第一输出模块803用于输出所述邮箱信誉情报库中与所述待识别的邮箱实体匹配的邮箱实体及其上下文信息；If the match is successful, the process enters the first output module 803, which is used to output the mailbox entity and its context information that matches the mailbox entity to be identified in the mailbox reputation intelligence database;

若匹配失败，则进入第二输出模块804，所述第二输出模块804用于提取所述待识别的邮箱实体中的上下文信息，并将提取的上下文信息和所述待识别的邮箱实体存储于所述邮箱信誉情报库。If the match fails, the second output module 804 is entered, and the second output module 804 is used to extract the context information in the mailbox entity to be identified, and store the extracted context information and the mailbox entity to be identified in the mailbox reputation intelligence library.

进一步地，对于上述邮箱信誉情报库的细化和扩展，还包括邮箱信誉情报库的生成装置。图9为本申请实施例中邮箱信誉情报库的生成装置的结构示意图，参见图9所示，该装置可以包括：Further, the refinement and expansion of the above-mentioned mailbox reputation intelligence database also includes a device for generating the mailbox reputation intelligence database. FIG9 is a schematic diagram of the structure of the device for generating the mailbox reputation intelligence database in an embodiment of the present application. Referring to FIG9 , the device may include:

采集模块901，用于通过多种信息源采集多个邮箱实体；A collection module 901 is used to collect multiple mailbox entities through multiple information sources;

提取模块902，用于分别从所述多个邮箱实体中提取上下文信息，所述上下文信息与邮箱实体信誉研判相关；An extraction module 902, configured to extract context information from the plurality of mailbox entities respectively, wherein the context information is related to the reputation assessment of the mailbox entity;

整合模块903，用于将同一邮箱实体的上下文信息进行整合；An integration module 903, used to integrate the context information of the same mailbox entity;

研判模块904，用于对整合后的上下文信息进行信誉研判，生成各个邮箱实体的研判结果；The evaluation module 904 is used to perform reputation evaluation on the integrated context information and generate evaluation results for each mailbox entity;

存储模块905，用于将各个邮箱实体的研判结果加入到相应的整合后的上下文信息中并存储于邮箱信誉情报库。The storage module 905 is used to add the analysis results of each mailbox entity to the corresponding integrated context information and store it in the mailbox reputation intelligence library.

在本申请其它实施例中，所述提取模块，具体用于提取所述多个邮箱实体中的发件人、邮件标签、网际互连协议IP地址、邮件主题列表、关联样本、邮箱实体Whois注册信息反查结果、恶意标签、域名以及活跃时间属性中的至少一项，作为多个邮箱实体分别对应的上下文信息。In other embodiments of the present application, the extraction module is specifically used to extract at least one of the senders, email labels, Internet Protocol IP addresses, email subject lists, associated samples, email entity Whois registration information reverse lookup results, malicious labels, domain names, and active time attributes from the multiple mailbox entities as context information corresponding to the multiple mailbox entities respectively.

在本申请其它实施例中，所述提取模块，具体用于分别提取所述多个邮箱实体的鉴定报告的原始数据中的mail_info、target、file、type字段，得到多个邮箱实体分别对应的第一子信息；分别提取所述多个邮箱实体的信息摘要算法MD5信息，得到多个邮箱实体分别对应的第二子信息；分别提取所述多个邮箱实体的沙箱数据报文中邮件的相关信息，得到多个邮箱实体分别对应的第三子信息；分别提取所述多个邮箱实体的邮件交换MX记录、发送方策略框架SPF协议信息、DMARC信息，得到多个邮箱实体分别对应的第四子信息；通过失陷检测模块对所述第一子信息、所述第二子信息、所述第三子信息和所述第四子信息进行检测，得到多个邮箱实体分别对应的检测结果；基于所述第一子信息、所述第二子信息、所述第三子信息、所述第四子信息和所述检测结果，确定多个邮箱实体分别对应的发件人、标签、网际互连协议IP地址、邮件主题列表、关联样本、邮箱实体Whois注册信息反查结果、恶意标签、域名和活跃时间属性，并作为多个邮箱实体分别对应的上下文信息。In other embodiments of the present application, the extraction module is specifically used to extract the mail_info, target, file, and type fields in the original data of the identification reports of the multiple mailbox entities, respectively, to obtain the first sub-information corresponding to the multiple mailbox entities; extract the information digest algorithm MD5 information of the multiple mailbox entities, respectively, to obtain the second sub-information corresponding to the multiple mailbox entities; extract the relevant information of the email in the sandbox data message of the multiple mailbox entities, respectively, to obtain the third sub-information corresponding to the multiple mailbox entities; extract the mail exchange MX record, sender policy framework SPF protocol information, and DMARC information of the multiple mailbox entities, respectively, to obtain the fourth sub-information corresponding to the multiple mailbox entities; detect the first sub-information, the second sub-information, the third sub-information, and the fourth sub-information through the loss detection module to obtain the detection results corresponding to the multiple mailbox entities; based on the first sub-information, the second sub-information, the third sub-information, the fourth sub-information and the detection results, determine the senders, labels, Internet Protocol IP addresses, email subject lists, associated samples, mailbox entity Whois registration information reverse query results, malicious labels, domain names, and active time attributes corresponding to the multiple mailbox entities, and use them as the context information corresponding to the multiple mailbox entities.

在本申请其它实施例中，所述生成装置还包括：动态更新模块；所述动态更新模块，用于接收新增抽取规则；按照新增抽取规则分别从所述多个邮箱实体中提取上下文信息。In other embodiments of the present application, the generating device further includes: a dynamic updating module; the dynamic updating module is used to receive a newly added extraction rule; and extract context information from the multiple mailbox entities respectively according to the newly added extraction rule.

在本申请其它实施例中，所述动态更新模块，具体用于分别提取所述多个邮箱实体中的域名；判断所述域名是否与已知恶意域名存在关联关系；若是，则将相同的域名和已知恶意域名作为相应邮箱实体的上下文信息；若否，则将提取的域名作为相应邮箱实体的上下文信息。In other embodiments of the present application, the dynamic update module is specifically used to extract the domain names in the multiple mailbox entities respectively; determine whether the domain names are associated with known malicious domain names; if so, use the same domain names and the known malicious domain names as context information of the corresponding mailbox entity; if not, use the extracted domain names as context information of the corresponding mailbox entity.

在本申请其它实施例中，所述研判模块，具体用于判断整合后的上下信息中是否存在白名单、是否存在标签信息、是否存在失陷检测判定级别分数、是否存在高级持续性威胁APT家族、是否存在异常字段以及是否存在历史记录分值；若至少有一个存在，则将判断出存在的内容对应的分值添加至相应邮箱实体对应的信誉分值中；若均不存在，则确定相应邮箱实体的信誉分值为0。In other embodiments of the present application, the analysis and judgment module is specifically used to determine whether there is a whitelist, whether there is label information, whether there is a compromise detection determination level score, whether there is an advanced persistent threat APT family, whether there is an abnormal field, and whether there is a historical record score in the integrated upper and lower information; if at least one exists, the score corresponding to the content determined to exist is added to the reputation score corresponding to the corresponding mailbox entity; if none of them exist, the reputation score of the corresponding mailbox entity is determined to be 0.

在本申请其它实施例中，所述存储模块，具体用于将各个邮箱实体的研判结果加入到相应的整合后的上下文信息中并按照信誉等级、基础信息、恶意活动标识和邮箱行为画像的分类方式存储于邮箱信誉情报库。In other embodiments of the present application, the storage module is specifically used to add the analysis and judgment results of each mailbox entity to the corresponding integrated context information and store it in the mailbox reputation intelligence library according to the classification method of reputation level, basic information, malicious activity identification and mailbox behavior portrait.

在本申请其它实施例中，所述接收模块，具体用于从开源数据、域名查询协议Whois记录、失陷攻击指标IOC信息、威胁情报TI数据、大网爬虫数据和黑/白名单中采集多个邮箱实体。In other embodiments of the present application, the receiving module is specifically used to collect multiple mailbox entities from open source data, domain name query protocol Whois records, compromised attack indicator IOC information, threat intelligence TI data, large-scale web crawler data and black/white lists.

这里需要指出的是，以上获取装置和生成装置实施例的描述，与上述获取方法和生成方法实施例的描述是类似的，具有同获取方法和生成方法实施例相似的有益效果。对于本申请获取装置和生成装置实施例中未披露的技术细节，请参照本申请获取方法和生成方法实施例的描述而理解。It should be noted here that the description of the above acquisition device and generation device embodiments is similar to the description of the above acquisition method and generation method embodiments, and has similar beneficial effects as the acquisition method and generation method embodiments. For technical details not disclosed in the acquisition device and generation device embodiments of this application, please refer to the description of the acquisition method and generation method embodiments of this application for understanding.

基于同一发明构思，作为对上述方法的实现，本申请实施例还提供了一种计算机存储介质。该计算机存储介质上存储有计算机程序，所述程序被处理器执行时可实现上述的方法。Based on the same inventive concept, as an implementation of the above method, the embodiment of the present application further provides a computer storage medium on which a computer program is stored, and when the program is executed by a processor, the above method can be implemented.

这里需要指出的是，以上计算机存储介质实施例的描述，与上述方法实施例的描述是类似的，具有同方法实施例相似的有益效果。对于本申请计算机存储介质实施例中未披露的技术细节，请参照本申请方法实施例的描述而理解。It should be noted here that the description of the above computer storage medium embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the computer storage medium embodiment of this application, please refer to the description of the method embodiment of this application for understanding.

基于同一发明构思，作为对上述方法的实现，本申请实施例还提供了一种电子设备。该电子设备可以包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述处理器执行所述程序时可实现上述的方法。Based on the same inventive concept, as an implementation of the above method, an embodiment of the present application further provides an electronic device. The electronic device may include a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor may implement the above method when executing the program.

这里需要指出的是，以上电子设备实施例的描述，与上述方法实施例的描述是类似的，具有同方法实施例相似的有益效果。对于本申请电子设备实施例中未披露的技术细节，请参照本申请方法实施例的描述而理解。It should be noted that the description of the above electronic device embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the electronic device embodiment of this application, please refer to the description of the method embodiment of this application for understanding.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art who is familiar with the present technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

1. A method for acquiring context information of a mailbox, the method comprising:

acquiring a mailbox entity to be identified;

matching the mailbox entity to be identified with mailbox entities in a mailbox reputation information library, wherein the mailbox reputation information library comprises various mailbox entities and context information thereof, and the content of the context information of one mailbox entity in the mailbox reputation information library is more than that of the context information extracted from the one mailbox entity independently;

if the matching is successful, outputting a mailbox entity matched with the mailbox entity to be identified and the context information thereof in the mailbox reputation information library;

if the match fails, the output is null.

2. The method of claim 1, wherein prior to obtaining the mailbox entity to be identified, the method further comprises:

collecting a plurality of mailbox entities through a plurality of information sources;

extracting context information from the mailbox entities respectively, wherein the context information is related to mailbox entity reputation research;

integrating the context information of the same mailbox entity;

reputation research and judgment are carried out on the integrated context information, and research and judgment results of all mailbox entities are generated;

And adding the research and judgment results of each mailbox entity into the corresponding integrated context information and storing the integrated context information in a mailbox reputation information library.

3. The method of claim 2, wherein the extracting context information from the plurality of mailbox entities, respectively, comprises:

and extracting at least one of a sender, a mail label, an Internet Protocol (IP) address, a mail subject list, a correlation sample, a mail box entity Whois registration information back check result, a malicious label, a domain name and an active time attribute in the plurality of mail box entities as context information respectively corresponding to the plurality of mail box entities.

4. The method of claim 3, wherein extracting at least one of a sender, a tag, an internet protocol IP address, a mail subject list, a correlation sample, a mail box entity Whois registration information review result, a malicious tag, a domain name, and an active time attribute in the plurality of mail box entities as the context information respectively corresponding to the plurality of mail box entities includes:

extracting the mail_ info, target, file, type fields in the original data of the identification reports of the mailbox entities respectively to obtain first sub-information corresponding to the mailbox entities respectively;

Respectively extracting information summary algorithm MD5 information of the mailbox entities to obtain second sub-information corresponding to the mailbox entities respectively;

respectively extracting the related information of the mails in the sandboxed data messages of the mailbox entities to obtain third sub-information corresponding to the mailbox entities respectively;

respectively extracting mail exchange MX records of the mailbox entities, sender policy framework SPF protocol information and DMARC information to obtain fourth sub-information respectively corresponding to the mailbox entities;

detecting the first sub-information, the second sub-information, the third sub-information and the fourth sub-information through a collapse detection module to obtain detection results respectively corresponding to a plurality of mailbox entities;

based on the first sub-information, the second sub-information, the third sub-information, the fourth sub-information and the detection result, determining a sender, a label, an Internet Protocol (IP) address, a mail topic list, a correlation sample, a mail box entity Whois registration information back check result, a malicious label, a domain name and an active time attribute which are respectively corresponding to a plurality of mail box entities, and using the same as context information respectively corresponding to the plurality of mail box entities.

5. The method of claim 2, wherein after adding the grinding results for each mailbox entity to the corresponding integrated context information and storing in the mailbox reputation information repository, the method further comprises:

receiving a new extraction rule;

and respectively extracting the context information from the mailbox entities according to the newly added extraction rule.

6. The method of claim 5, wherein extracting context information from the plurality of mailbox entities according to the new extraction rule, respectively, comprises:

extracting domain names in the mailbox entities respectively;

judging whether the domain name has an association relationship with a known malicious domain name or not;

if yes, the same domain name and the known malicious domain name are used as the context information of the corresponding mailbox entity;

if not, the extracted domain name is used as the context information of the corresponding mailbox entity.

7. The method of claim 2, wherein reputation research of the integrated context information generates a research result of each mailbox entity, comprising:

judging whether a white list exists in the integrated upper and lower information, label information exists, a collapse detection judgment level score exists, an advanced persistent threat APT family exists, an abnormal field exists and a history score exists;

If at least one exists, adding the score corresponding to the content judged to exist into the credit score corresponding to the corresponding mailbox entity;

and if none exists, determining that the credit score of the corresponding mailbox entity is 0.

8. The method of claim 2, wherein adding the grinding results of each mailbox entity to the corresponding integrated context information and storing in the mailbox reputation information repository comprises:

and adding the research and judgment results of each mailbox entity into the corresponding integrated context information and storing the results in a mailbox reputation information library according to the reputation level, the basic information, the malicious activity identification and the classification mode of mailbox behavior portraits.

9. The method of claim 2, wherein the collecting a plurality of mailbox entities via a plurality of information sources comprises:

and acquiring a plurality of mailbox entities from the open source data, the domain name query protocol Whois records, the collapse attack index IOC information, the threat information TI data, the large-net crawler data and the black/white list.

10. A method for generating a mailbox reputation information base, the method comprising:

integrating the context information of the same mailbox entity;

11. An apparatus for acquiring context information of a mailbox, the apparatus comprising:

the receiving module is used for acquiring a mailbox entity to be identified;

the matching module is used for matching the mailbox entity to be identified with the mailbox entity in the mailbox reputation information library, wherein the mailbox reputation information library comprises various mailbox entities and context information thereof, and the content of the context information of one mailbox entity in the mailbox reputation information library is more than that of the context information extracted from the one mailbox entity independently;

if the matching is successful, a first output module is entered, wherein the first output module is used for outputting mailbox entities and context information thereof matched with the mailbox entities to be identified in the mailbox reputation information library;

If the matching fails, a second output module is entered, wherein the second output module is used for extracting the context information in the mailbox entity to be identified and storing the extracted context information and the mailbox entity to be identified in the mailbox reputation information library.

12. A device for generating a mailbox reputation information base, the device comprising:

the acquisition module is used for acquiring a plurality of mailbox entities through a plurality of information sources;

the extraction module is used for respectively extracting context information from the mailbox entities, and the context information is related to the reputation research and judgment of the mailbox entities;

the integrating module is used for integrating the context information of the same mailbox entity;

the judging module is used for carrying out reputation judgment on the integrated context information and generating judging results of all mailbox entities;

and the storage module is used for adding the research and judgment results of each mailbox entity into the corresponding integrated context information and storing the integrated context information in the mailbox reputation information library.

13. A computer storage medium having stored thereon a computer program, which when executed by a processor, is adapted to carry out the method of any of claims 1-10.

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to implement a method as claimed in any one of claims 1 to 10 when the program is executed by the processor.