HK1248895B

HK1248895B - Risk characteristic screening and describing message generation method and device and electronic equipment

Info

Publication number: HK1248895B
Application number: HK18108436.5A
Authority: HK
Inventors: 张鹏; 印晓华; 张向阳; 薛峰; 顾曦; 郭倩婷; 屠剑威
Original assignee: 创新先进技术有限公司
Filing date: 2018-06-29
Publication date: 2021-04-23

Description

Risk feature screening, description message generation method, device and electronic equipment

技术领域Technical Field

本说明书涉及计算机技术领域，尤其涉及风险特征筛选、描述报文生成方法、装置以及电子设备。This specification relates to the field of computer technology, and in particular to risk feature screening, description message generation methods, devices, and electronic devices.

背景技术Background Art

随着互联网金融的快速发展，互联网金融交易的数量在快速增长。在大量的互联网金融交易中，可能存在一些不法人员进行洗钱等非法交易。因此，需要工作人员从大量交易记录中查找到可疑交易，并生成对应的可疑交易描述报文，反馈到有关管理部门，这些可疑交易也可以称为风险事件。With the rapid development of internet finance, the number of internet financial transactions is also growing rapidly. Among these large numbers of internet financial transactions, there is a risk that some individuals may engage in illegal activities such as money laundering. Therefore, personnel are required to identify suspicious transactions within the vast volume of transaction records, generate corresponding suspicious transaction descriptions, and report these to the relevant management departments. These suspicious transactions can also be considered risk events.

在现有技术中，接收到可疑交易数据后，通常通过工作人员根据这些数据，按照预定义的报文模板以人工方式编写描述可疑交易的报文，其中，报文长度是受到限制的。In the prior art, after receiving suspicious transaction data, staff usually manually write a message describing the suspicious transaction based on the data and a predefined message template, wherein the message length is limited.

基于现有技术，需要能够基于报文长度约束条件，针对可疑交易生成更有参考性的描述报文的方案。Based on the existing technology, there is a need for a solution that can generate more reference descriptive messages for suspicious transactions based on message length constraints.

发明内容Summary of the Invention

本说明书实施例提供风险特征筛选、描述报文生成方法、装置以及电子设备，用于解决以下技术问题：需要能够基于报文长度约束条件，针对可疑交易生成更有参考性的描述报文的方案。The embodiments of this specification provide a method, device, and electronic device for risk feature screening and description message generation to solve the following technical problem: a solution is needed to generate a more reference-based description message for suspicious transactions based on message length constraints.

为解决上述技术问题，本说明书实施例是这样实现的：To solve the above technical problems, the embodiments of this specification are implemented as follows:

本说明书实施例提供一种风险特征筛选方法，包括：The present invention provides a method for screening risk characteristics, including:

获取多个风险特征分别的特征权重，所述特征权重根据利用样本事件训练得到的分类模型得到或者预定义得到，所述分类模型用于判定风险事件；Obtaining feature weights of each of the plurality of risk features, the feature weights being obtained based on a classification model trained using sample events or being predefined, the classification model being used to determine risk events;

根据所述特征权重和预定条件，筛选出至少部分风险特征，所述预定条件用于约束根据风险特征所生成报文的长度。At least some of the risk features are screened out according to the feature weights and predetermined conditions, where the predetermined conditions are used to constrain the length of messages generated according to the risk features.

本说明书实施例提供的一种描述报文生成方法，包括：The embodiment of this specification provides a description message generation method, including:

获取待描述事件；Get the event to be described;

确定筛选出的各风险特征；Determine the risk characteristics of each screened risk;

根据所述筛选出的各风险特征，为所述待描述事件生成描述报文；generating a description message for the event to be described based on the screened risk features;

其中，所述筛选出各风险特征包括：获取多个风险特征分别的特征权重，根据所述特征权重和预定条件，筛选出所述各风险特征，所述特征权重根据利用样本事件训练得到的分类模型得到或者预定义得到，所述分类模型用于判定风险事件，所述预定条件用于约束根据风险特征所生成报文的长度。Among them, the screening out of each risk feature includes: obtaining the feature weights of multiple risk features respectively, and screening out the risk features according to the feature weights and predetermined conditions, the feature weights are obtained according to a classification model trained using sample events or are predefined, the classification model is used to determine risk events, and the predetermined conditions are used to constrain the length of the message generated according to the risk feature.

本说明书实施例提供的一种风险特征筛选装置，包括：The embodiments of this specification provide a risk feature screening device, comprising:

获取模块，获取多个风险特征分别的特征权重，所述特征权重根据利用样本事件训练得到的分类模型得到或者预定义得到，所述分类模型用于判定风险事件；an acquisition module, which acquires feature weights of a plurality of risk features, wherein the feature weights are obtained according to a classification model trained using sample events or are predefined, and the classification model is used to determine risk events;

筛选模块，根据所述特征权重和预定条件，筛选出至少部分风险特征，所述预定条件用于约束根据风险特征所生成报文的长度。The screening module screens out at least some of the risk features according to the feature weights and predetermined conditions, wherein the predetermined conditions are used to constrain the length of the message generated according to the risk features.

本说明书实施例提供的一种描述报文生成装置，包括：An embodiment of this specification provides a description message generation device, including:

获取模块，获取待描述事件；Get the module and get the event to be described;

确定模块，确定筛选出的各风险特征；Determine the module and identify the risk characteristics of each screened out;

生成模块，根据所述筛选出的各风险特征，为所述待描述事件生成描述报文；A generating module, generating a description message for the event to be described according to the screened risk features;

本说明书实施例提供的一种风险特征筛选电子设备，包括：An embodiment of this specification provides an electronic device for screening risk characteristics, including:

至少一个处理器；以及，at least one processor; and,

与所述至少一个处理器通信连接的存储器；其中，a memory communicatively connected to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够：The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to:

本说明书实施例提供的一种描述报文生成电子设备，包括：An electronic device for generating a description message provided in an embodiment of this specification includes:

至少一个处理器；以及，at least one processor; and,

获取待描述事件；Get the event to be described;

本说明书实施例采用的上述至少一个技术方案能够达到以下有益效果：可以利用训练得到的分类模型，确定各风险特征分别的特征权重，根据特征权重，以及用于约束根据风险特征所生成报文的长度的预定条件，为待描述事件生成描述报文，从而生成的描述报文更有参考性；其中，待描述事件比如可以是疑似洗钱交易等可疑交易。At least one of the above-mentioned technical solutions adopted in the embodiments of this specification can achieve the following beneficial effects: the classification model obtained by training can be used to determine the feature weights of each risk feature, and based on the feature weights and predetermined conditions for constraining the length of the message generated based on the risk features, a description message is generated for the event to be described, so that the generated description message is more useful for reference; wherein, the event to be described may be, for example, a suspicious transaction such as a suspected money laundering transaction.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本说明书实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本说明书中记载的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of this specification or the technical solutions in the prior art, the following briefly introduces the drawings required for use in the embodiments or the description of the prior art. Obviously, the drawings described below are only some embodiments recorded in this specification. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative labor.

图1为本说明书的方案在一种实际应用场景下涉及的一种整体架构示意图；FIG1 is a schematic diagram of an overall architecture of the solution of this specification in a practical application scenario;

图2为本说明书实施例提供的一种风险特征筛选方法的流程示意图；FIG2 is a schematic diagram of a process for screening risk characteristics according to an embodiment of the present disclosure;

图3为本说明书实施例提供的一种描述报文生成方法的流程示意图；FIG3 is a flow chart of a description message generation method provided in an embodiment of this specification;

图4为本说明书实施例提供的描述报文的部分截图的示意图；FIG4 is a schematic diagram of a partial screenshot of a description message provided in an embodiment of this specification;

图5为本说明书实施例提供的一种自动报文算法的示意图；FIG5 is a schematic diagram of an automatic message algorithm provided in an embodiment of this specification;

图6为本说明书实施例提供的一种实际应用场景下的可疑交易甄别流程示意图；FIG6 is a schematic diagram of a suspicious transaction identification process in an actual application scenario provided by an embodiment of this specification;

图7为本说明书实施例提供的对应于图2的一种风险特征筛选装置的结构示意图；FIG7 is a schematic structural diagram of a risk feature screening device corresponding to FIG2 provided in an embodiment of this specification;

图8为本说明书实施例提供的对应于图3的一种描述报文生成装置的结构示意图。FIG8 is a schematic structural diagram of a description message generating device corresponding to FIG3 provided in an embodiment of this specification.

具体实施方式DETAILED DESCRIPTION

本说明书实施例提供风险特征筛选、描述报文生成方法、装置以及电子设备。The embodiments of this specification provide a method, device, and electronic device for screening risk characteristics and generating description messages.

为了使本技术领域的人员更好地理解本说明书中的技术方案，下面将结合本说明书实施例中的附图，对本说明书实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本说明书实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都应当属于本申请保护的范围。In order to help those skilled in the art better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below in conjunction with the drawings in the embodiments of this specification. Obviously, the embodiments described are only part of the embodiments of this application, not all of the embodiments. Based on the embodiments of this specification, all other embodiments obtained by ordinary technicians in this field without making creative efforts should fall within the scope of protection of this application.

为了便于理解，对本说明书的方案的思路进行分析。To facilitate understanding, the idea of the solution in this specification is analyzed.

在没有报文长度约束条件的情况下，可以使描述报文覆盖可疑交易的全部信息点，其中，每个信息点分别反映可疑交易的其中一个风险特征的数据，比如，信息点是根据风险特征生成的子报文。将由全部风险特征构成的集合记作S。In the absence of message length constraints, the message describing the suspicious transaction can cover all information points, where each information point reflects data related to one of the risk characteristics of the suspicious transaction. For example, an information point is a sub-message generated based on a risk characteristic. The set of all risk characteristics is denoted as S.

而在有报文长度约束条件的情况下，描述报文通常只能覆盖可疑交易的一部分风险特征数据而不是全部，否则报文长度将会超限。那么，为了使生成的描述报文参考性尽量高，需要对风险特征进行筛选，以筛选出参考价值最高的风险特征子集合，风险特征子集合记作假定利用分类模型的受试者工作特征曲线线下面积(Area Under roc Curve，AUC)来度量S'的参考价值。一种理想的目标是：筛选得到对应的AUC最大的S'。However, when message length is constrained, the description message typically only covers a portion of the risk characteristic data for suspicious transactions, rather than all of it, otherwise the message length will exceed the limit. Therefore, to maximize the reference value of the generated description message, the risk characteristics must be screened to identify the subset of risk characteristics with the highest reference value. This subset of risk characteristics is denoted as the reference value S', measured by the area under the receiver operating characteristic curve (AUC) of the assumed classification model. An ideal goal is to select the S' that maximizes the corresponding AUC.

该理想的目标属于组合优化问题，在风险特征数量较多时，计算量很大不利于实用，基于此，本说明书的方案利用贪心搜索策略，对该组合优化问题进行近似求解，求得局部最优解即可，如此可以减少计算量，效率较高。This ideal goal belongs to a combinatorial optimization problem. When the number of risk features is large, the amount of calculation is very large and not conducive to practical application. Based on this, the solution of this specification uses a greedy search strategy to approximate the solution to the combinatorial optimization problem and obtain a local optimal solution. This can reduce the amount of calculation and is more efficient.

本说明书的方案可以用于：在一个待筛选风险特征集合中，筛选参考价值相对高的风险特征；进一步地可以用于利用筛选出的风险特征，为诸如可疑交易等风险事件生成描述报文。The solution of this specification can be used to: screen risk features with relatively high reference value from a set of risk features to be screened; further, it can be used to use the screened risk features to generate descriptive messages for risk events such as suspicious transactions.

图1为本说明书的方案在一种实际应用场景下涉及的一种整体架构示意图。该整体架构包括至少一个设备，设备工作流程主要包括：确定待筛选的多个风险特征，以及筛选出至少部分风险特征；以及输入待描述事件到用于生成描述报文的设备，该设备根据待描述事件以及筛选出的风险特征，生成描述报文，其中，上述至少一个设备中可以包括用于判定风险事件的分类模型。Figure 1 is a schematic diagram of the overall architecture involved in the solution of this specification in an actual application scenario. This overall architecture includes at least one device, and the device workflow primarily includes: determining multiple risk features to be screened, and screening out at least some of the risk features; and inputting an event to be described into a device for generating a description message. The device generates a description message based on the event to be described and the screened risk features. The at least one device may include a classification model for determining risk events.

基于上述思路和整体架构，下面对本说明书的方案进行详细说明。Based on the above ideas and overall architecture, the solution of this manual is described in detail below.

本说明书实施例提供了一种风险特征筛选方法，如图2所示，该方法的流程可以包括以下步骤：The present invention provides a method for screening risk characteristics. As shown in FIG2 , the method may include the following steps:

S202：获取多个风险特征分别的特征权重，所述特征权重根据利用样本事件训练得到的分类模型得到或者预定义得到，所述分类模型用于判定风险事件。S202: Obtain feature weights of the plurality of risk features, wherein the feature weights are obtained according to a classification model trained using sample events or are predefined, and the classification model is used to determine risk events.

在本说明书实施例中，样本事件有多个。对于同一风险特征，不同样本事件的特征取值可以不同。一般可以预先利用样本事件训练得到分类模型，进而利用分类模型确定各风险特征对应的特征权重。In the embodiments of this specification, there are multiple sample events. For the same risk feature, the feature values of different sample events can be different. Generally, a classification model can be pre-trained using sample events, and then the classification model can be used to determine the feature weights corresponding to each risk feature.

例如，特征权重具体可以通过计算风险特征对应于分类模型的分类准确性度量指标得到，其中，分类准确性度量指标比如是AUC、信息熵、或者分类精确率等。For example, the feature weight can be obtained by calculating the classification accuracy metric of the risk feature corresponding to the classification model, where the classification accuracy metric is, for example, AUC, information entropy, or classification precision.

当然，也可以不依赖于分类模型，而预定义得到特征权重。Of course, feature weights can also be predefined without relying on the classification model.

特征权重反映风险特征的重要程度，一般地，对于特征权重越高的风险特征，可以优先选择以用于描述事件。进一步地，由于存在报文长度约束，也即上述的预定条件，则特征权重未必是筛选风险特征的唯一依据，比如，还可以结合风险特征对应的子报文长度等因素进行筛选。Feature weights reflect the importance of risk features. Generally, risk features with higher feature weights are preferred for event description. Furthermore, due to message length constraints (the aforementioned predefined conditions), feature weights may not be the sole basis for screening risk features. For example, factors such as the length of the sub-message corresponding to the risk feature can also be considered for screening.

风险事件可以是可疑交易，比如，疑似洗钱交易、或疑似盗取账户者冒充账户主人进行的交易等。风险事件也是可以是交易以外的可疑的业务操作事件，比如，非法登录事件等。Risk events can be suspicious transactions, such as suspected money laundering or transactions conducted by a suspected account hacker impersonating the account owner. Risk events can also be suspicious business operations other than transactions, such as illegal logins.

S204：根据所述特征权重和预定条件，筛选出至少部分风险特征，所述预定条件用于约束根据风险特征所生成报文的长度。S204: Filter out at least some risk features according to the feature weights and predetermined conditions, where the predetermined conditions are used to constrain the length of messages generated according to the risk features.

通过图2的方法，可以筛选出更有参考价值的风险特征。基于图2的方法，本说明书实施例还提供了该方法的一些具体实施方案，以及扩展方案，下面进行说明。By using the method of Figure 2, risk features with greater reference value can be screened out. Based on the method of Figure 2, the present specification also provides some specific implementation plans and extension plans of the method, which are described below.

在本说明书实施例中，预定义特征权重比较容易理解，一般根据运营人员的经验进行即可。以下主要对另一种得到特征权重的方式进行说明。In the embodiments of this specification, pre-defining feature weights is relatively easy to understand and can generally be done based on the experience of the operator. The following mainly describes another method for obtaining feature weights.

对于步骤S202，利用样本事件训练得到的分类模型得到所述特征权重，具体可以包括：利用样本事件训练得到分类模型；分别针对所述多个风险特征执行：获取所述样本事件中对应于该风险特征的数据；根据所述对应于该风险特征的数据，计算该风险特征对应于所述分类模型的分类准确性度量指标；根据该分类准确性度量指标，得到该风险特征的特征权重。For step S202, the feature weight is obtained by using the classification model obtained by training with sample events, which may specifically include: obtaining the classification model by using sample events; performing respectively for the multiple risk features: obtaining data corresponding to the risk feature in the sample events; calculating the classification accuracy measurement index of the risk feature corresponding to the classification model based on the data corresponding to the risk feature; and obtaining the feature weight of the risk feature based on the classification accuracy measurement index.

在本说明书实施例中，风险特征对应的对应于分类模型的分类准确性度量指标具体可以表示：单独采用样本事件对应于该风险特征的数据作为分类模型输入，对样本事件进行分类的准确程度。以分类准确性度量指标是AUC为例，AUC越高，则分类的准确程度越高。In the embodiments of this specification, the classification accuracy metric for a classification model corresponding to a risk feature may specifically represent the accuracy of classifying a sample event using the data corresponding to the risk feature as input to the classification model. For example, the AUC is a classification accuracy metric. A higher AUC indicates a higher classification accuracy.

所述分类模型可以是随机森林模型或者逻辑回归模型等。以随机森林模型为例，假如训练样本D＝(x,y)，其中x∈R^n*d是模型输入数据；y∈Rn*1是样本标签，样本标签比如表示样本事件是否涉及洗钱，也即，是否为疑似洗钱交易；进而，根据训练样本数据x和样本标签y和，构建决策树，根据构建的多个决策树训练得到随机森林模型。The classification model can be a random forest model or a logistic regression model. Taking the random forest model as an example, suppose the training sample D = (x, y), where x∈Rn ^*d is the model input data; y∈Rn*1 is the sample label, which indicates, for example, whether the sample event involves money laundering, that is, whether it is a suspected money laundering transaction; then, based on the training sample data x and the sample label y, a decision tree is constructed, and the random forest model is trained based on the constructed multiple decision trees.

在本说明书实施例中，根据风险特征数据，可以生成对应的子报文。所述多个风险特征分别有对应的子报文字数，可以预先确定或者预估子报文字数。In the embodiment of the present specification, corresponding sub-messages can be generated based on the risk characteristic data. The plurality of risk characteristics each have a corresponding number of sub-message characters, and the number of sub-message characters can be predetermined or estimated.

在这种情况下，对于步骤S204，所述根据所述特征权重和预定条件，筛选出至少部分所述风险特征，具体可以包括：根据所述特征权重及对应的所述子报文字数，对所述多个风险特征进行第一排序；根据所述第一排序结果、所述子报文字数，以及预定条件，筛选出至少部分风险特征。In this case, for step S204, filtering out at least part of the risk features based on the feature weights and predetermined conditions may specifically include: performing a first sorting of the multiple risk features based on the feature weights and the corresponding number of sub-report characters; filtering out at least part of the risk features based on the first sorting result, the number of sub-report characters, and the predetermined conditions.

以子报文字数是预先为风险特征定义的子报文模板的预定字数为例。子报文模板可以包含风险特征和对应的描述语句，可以是预先建立各风险特征与描述语句之间的对应关系；比如，<特征1，描述语句1>，<特征2，描述语句2>，<特征3，描述语句3>，一般将风险特征具体的取值代入描述语句，即可以得到子报文。则描述语句的默认字数即为上述的预定字数。For example, the sub-message character count is a predefined character count for a sub-message template defined for a risk characteristic. The sub-message template can include risk characteristics and corresponding description statements. A predefined correspondence between each risk characteristic and description statement can be established; for example, <Feature 1, Description Statement 1>, <Feature 2, Description Statement 2>, <Feature 3, Description Statement 3>. Generally, the sub-message is generated by substituting the specific values of the risk characteristics into the description statements. The default character count for the description statements is then the predefined character count.

进一步地，所述根据所述特征权重及对应的所述子报文字数，对所述多个风险特征进行第一排序，具体可以包括：确定所述多个风险特征按照所述特征权重大小，进行第二排序得到的第二排序结果；根据所述第二排序结果，选取所述多个风险特征中的至少部分风险特征；根据所述特征权重及对应的所述子报文字数，对所述选取的风险特征进行第一排序。Furthermore, the first sorting of the multiple risk features according to the feature weights and the corresponding number of sub-report characters may specifically include: determining a second sorting result obtained by performing a second sorting of the multiple risk features according to the feature weights; selecting at least part of the multiple risk features according to the second sorting result; and performing a first sorting of the selected risk features according to the feature weights and the corresponding number of sub-report characters.

在实际应用中，当风险特征较多时，可以先对风险特征进行排序和/或预筛选等处理，再正式地进行筛选，如此有利于减少筛选所耗费的处理资源。In practical applications, when there are many risk features, the risk features can be sorted and/or pre-screened before formal screening. This helps reduce the processing resources consumed by screening.

例如，假定按照特征权重由大到小的顺序，对风险特征进行第二排序，可以将第二排序结果中比较靠后的风险特征剔除，保留靠前的风险特征。For example, assuming that the risk features are sorted in a second order according to the feature weights from large to small, the risk features at the back of the second sorting result can be eliminated, and the risk features at the front can be retained.

需要说明的是，预筛选(基于上述的第二排序进行)并非一个必须的步骤，可以根据实际需求决定是否执行。It should be noted that pre-screening (performed based on the second sorting described above) is not a necessary step, and whether to perform it can be determined based on actual needs.

在本说明书实施例中，所述根据所述特征权重及对应的所述子报文字数，对所述多个风险特征进行第一排序，具体可以包括：根据所述风险特征对应的所述特征权重和所述子报文字数，计算所述风险特征对应的单位字数权重；按照所述单位字数权重，对所述多个风险特征进行第一排序。In an embodiment of the present specification, the first sorting of the multiple risk features based on the feature weights and the corresponding number of sub-report characters may specifically include: calculating the unit word count weight corresponding to the risk feature based on the feature weights and the corresponding sub-report characters of the risk feature; and performing a first sorting of the multiple risk features according to the unit word count weight.

单位字数权重可以表示：子报文中每个字对其对应的特征权重的平均贡献。更直观地，比如，单位字数权重可以等于特征权重除以对应的子报文字数。The unit word weight can be expressed as the average contribution of each word in the sub-message to its corresponding feature weight. More intuitively, for example, the unit word weight can be equal to the feature weight divided by the number of words in the corresponding sub-message.

当然，也可以基于单位字数权重以外的其他指标对风险特征进行排序以及筛选，比如，单位字数信息量等。Of course, risk features can also be sorted and screened based on other indicators besides the unit word weight, such as the amount of information per unit word.

前面在说明方案思路时提到，利用贪心搜索策略进行近似求解。下面先示出近似求解过程，再进行分析。As mentioned earlier in the solution, we use a greedy search strategy to approximate the solution. Below, we first illustrate the approximate solution process, followed by analysis.

在本说明书实施例中，所述根据所述第一排序结果、所述子报文字数，以及预定条件，筛选出至少部分风险特征，具体可以包括：In the embodiment of this specification, screening out at least some risk features based on the first sorting result, the number of sub-report characters, and a predetermined condition may specifically include:

根据所述第一排序结果，针对所述第一排序结果包含的各风险特征，按照单位字数权重从大到小的顺序，进行遍历，针对当前风险特征执行：Based on the first sorting result, for each risk feature included in the first sorting result, traverse in descending order of unit word weight, and execute the following for the current risk feature:

将当前风险特征加入设定集合，判断所述设定集合中包含的风险特征对应的子报文字数之和是否符合预定条件；若是，遍历至下一个风险特征；否则，将当前风险特征从所述设定集合中剔除，结束遍历过程，将所述设定集合中包含的风险特征作为筛选出的至少部分风险特征；其中，所述设定集合初始时为空集。Add the current risk feature to the setting set, and determine whether the sum of the number of sub-report characters corresponding to the risk features included in the setting set meets the predetermined conditions; if so, traverse to the next risk feature; otherwise, remove the current risk feature from the setting set, end the traversal process, and use the risk features included in the setting set as at least part of the screened risk features; wherein, the setting set is initially an empty set.

在实际应用中，在上述判断过程中，若判断结果为否，也未必要结束遍历操作，比如，可以继续按顺序尝试选择后面的风险特征加入设定集合，再看是否满足约束条件。In actual applications, in the above judgment process, if the judgment result is negative, it is not necessary to end the traversal operation. For example, you can continue to try to select the subsequent risk features in sequence to add to the set set and then see whether the constraints are met.

在本说明书实施例中，对于步骤S206，所述遍历至下一个风险特征，具体可以包括：In the embodiment of this specification, for step S206, traversing to the next risk feature may specifically include:

确定所述设定集合对应于所述分类模型的分类准确性度量指标；Determining a classification accuracy metric corresponding to the classification model for the setting set;

判断该分类准确性度量指标是否不大于加入当前风险特征前的所述设定集合对应于所述分类模型的分类准确性度量指标；若是，将当前风险特征从所述设定集合中剔除，遍历至下一个风险特征；否则，遍历至下一个风险特征。Determine whether the classification accuracy metric is not greater than the classification accuracy metric of the setting set corresponding to the classification model before adding the current risk feature; if so, remove the current risk feature from the setting set and traverse to the next risk feature; otherwise, traverse to the next risk feature.

为了避免混淆，举例对所述加入当前风险特征前的所述设定集合进行说明。例如，设定集合中已加入了9个风险特征(假定将此时的设定集合称为：当前集合)，接下来要加入第10个风险特征(也即，当前风险特征)，则所述加入当前风险特征前的所述设定集合指：该当前集合。To avoid confusion, the following example illustrates the configuration set before the current risk feature is added. For example, if nine risk features have already been added to the configuration set (assuming the configuration set at this point is referred to as the "current set"), and the 10th risk feature (i.e., the "current risk feature") is to be added, then the configuration set before the current risk feature is referred to as the "current set."

上面示出了利用贪心搜索策略进行近似求解的过程，下面进行分析。The above shows the process of using the greedy search strategy to obtain an approximate solution, which is analyzed below.

若要获得上述的理想的目标，则需要对风险特征子集合S'进行穷举，以求得在满足报文长度约束条件的情况下对应的AUC(分类准确性度量指标的一种示例)最大的S'。To achieve the above-mentioned ideal goal, it is necessary to exhaustively enumerate the risk feature subset S' to find the S' with the maximum AUC (an example of a classification accuracy metric) while satisfying the message length constraint.

而贪心搜索策略则避免了穷举，其基于第一排序结果，依次对风险特征进行优选，每次选入剩余各风险特征中最优(在上例中，最优指单位字数权重最大)的风险特征，直至达到报文长度约束条件的限制。并且，近似地认为每次加入风险特征后对应的AUC都会变大，从而避免每次都计算AUC，可以节省处理资源，以及提高筛选效率。The greedy search strategy avoids exhaustive searching. Based on the first ranking result, it sequentially selects the best risk feature among the remaining risk features (in the above example, the best refers to the one with the highest weight per word count) until the message length constraint is met. Furthermore, it is assumed that the corresponding AUC increases with each risk feature added, thus avoiding the need to calculate the AUC each time, saving processing resources and improving screening efficiency.

当然，更精确地，也可以每次都计算AUC。原因在于：新加入的风险特征也有可能使得AUC降低；在这种情况下，可以将该风险特征剔除。Of course, to be more precise, the AUC can also be calculated each time. The reason is that a newly added risk feature may also reduce the AUC; in this case, the risk feature can be removed.

例如，存在一个风险特征S⁽ⁱ⁾与已获得的设定集合S'相关性强，或者S(i)包含的噪声明显，该风险特征S(i)会导致分类模型的分类能力下降或不变(也即，分类准确性度量指标下降或不变)，则可以把S⁽ⁱ⁾剔除出S'。For example, if there is a risk feature S ⁽ⁱ⁾ that is highly correlated with the obtained set S', or S(i) contains obvious noise, and this risk feature S(i) will cause the classification ability of the classification model to decrease or remain unchanged (that is, the classification accuracy metric decreases or remains unchanged), then S ⁽ⁱ⁾ can be eliminated from S'.

在本说明书实施例中，基于筛选风险特征，可以进一步地为诸如疑似洗钱交易等待描述的风险事件生成描述报文，其中，是否是风险事件可以由上述的分类模型进行判定，或者基于人工经验判定等。In the embodiments of this specification, based on the screening of risk features, a description message can be further generated for risk events such as suspected money laundering transactions waiting to be described, wherein whether it is a risk event can be determined by the above-mentioned classification model or based on manual experience.

例如，获取待描述事件，分别针对筛选出至少部分风险特征，生成对应于待描述事件的子报文，对各子报文进行拼装，得到待描述事件的描述报文。另外，为了提高效率，可以利用预定义的子报文模板，生成子报文。For example, an event to be described is obtained, and at least some risk features are screened out, and sub-messages corresponding to the event to be described are generated. The sub-messages are assembled to obtain a description message for the event to be described. In addition, to improve efficiency, predefined sub-message templates can be used to generate sub-messages.

基于同样的思路，本说明书实施例还提供了一种描述报文生成方法的流程示意图，如图3所示。Based on the same idea, the embodiment of this specification also provides a flowchart describing a message generation method, as shown in FIG3 .

图3中的流程可以包括以下步骤：The process in Figure 3 may include the following steps:

S302：获取待描述事件。S302: Obtain the event to be described.

S304：确定筛选出的各风险特征。S304: Determine the risk characteristics of each screened out.

在本说明书实施例中，风险特征可以在该流程执行前预先筛选，也可以在获取待描述事件后再筛选。In the embodiments of this specification, risk features may be pre-screened before the process is executed, or may be screened after the event to be described is obtained.

S306：根据所述筛选出的各风险特征，为所述待描述事件生成描述报文；S306: Generate a description message for the event to be described based on the screened risk features;

在实际应用中，可以一边筛选风险特征，一边生成对应的子报文，也可以在全部风险特征筛选完毕后，再生成子报文。进而，可以得到由子报文构成的描述报文。In actual applications, the corresponding sub-messages can be generated while the risk features are being screened, or the sub-messages can be generated after all risk features have been screened.

通过图3的方法，有利于为待描述事件生成更有参考性的描述报文。The method shown in FIG3 is helpful for generating a more informative description message for the event to be described.

更直观地，本说明书实施例还提供了为可疑交易生成的描述报文的一种内容构成示例，描述报文比如包括六部分内容，每部分对应于一个或多个风险特征：More intuitively, the embodiments of this specification also provide an example of the content structure of a description message generated for a suspicious transaction. The description message includes, for example, six parts, each corresponding to one or more risk characteristics:

第一，概述可疑交易情况；First, an overview of the suspicious transactions;

第二，表述发现可疑交易的过程，例如，时间、地点等信息；Second, describe the process of discovering the suspicious transaction, such as time, location, etc.

第三，可疑账户开户情况，例如，开户资料的基本情况等；Third, the opening of suspicious accounts, such as basic information on account opening documents;

第四，可疑交易的总体情况，例如，交易的时间段、涉及交易次数和金额、资金的来源和去向、交易流程等；Fourth, the overall circumstances of the suspicious transactions, such as the time period of the transactions, the number and amount of transactions involved, the source and destination of the funds, and the transaction process;

第五，可疑点分析，逐条列举可疑点，例如，开销户信息和交易过程中的其他可疑信息等；Fifth, suspicious point analysis, listing suspicious points one by one, such as account opening and closing information and other suspicious information in the transaction process;

第六，对报文进行判断，结合所有的数据分析和主观判断，对交易给出最终标签，例如，疑似洗钱交易。Sixth, the message is judged, and by combining all data analysis and subjective judgment, a final label is given to the transaction, for example, a suspected money laundering transaction.

图4为本说明书实施例提供的描述报文的部分截图的示意图，图4中示出了上述六部分中的部分内容。基于本说明书实施例生成的描述报文，可以突出重点，而且不会超过报文长度限制。Figure 4 is a schematic diagram of a partial screenshot of a description message provided by an embodiment of this specification, showing part of the six parts mentioned above. The description message generated based on the embodiment of this specification can highlight the key points and will not exceed the message length limit.

在一种实际应用场景下，针对疑似洗钱交易可以生成的描述报文有两类。一类是上面各实施例所述的描述报文，称为确定性报文，这部分报文通常是直接根据客观数据得到的，不掺杂主观分析数据；另一类称为不确定性报文，这部分报文可以掺杂主观分析数据。在这种情况下，上述的报文长度约束条件是针对确定性报文的。In a practical application scenario, two types of descriptive messages can be generated for suspected money laundering transactions. One type is the descriptive messages described in the above embodiments, known as deterministic messages. These messages are typically derived directly from objective data and are not mixed with subjective analysis data. The other type is known as nondeterministic messages, which may be mixed with subjective analysis data. In this case, the above message length constraints apply to deterministic messages.

本说明书实施例提供一种基于疑似洗钱交易自动生成描述报文模型的建模方案，该方案可以包括以下步骤：The embodiments of this specification provide a modeling solution for automatically generating a description message model based on suspected money laundering transactions. The solution may include the following steps:

给定一个带标签的训练样本集合D(X,Y)，其中，X∈R^n*d是样本模型输入数据，Y∈R^n*1是样本标签，样本标签可以表示样本事件是否为洗钱交易。Given a labeled training sample set D(X,Y), where X∈R ^n*d is the sample model input data and Y∈R ^n*1 is the sample label. The sample label can indicate whether the sample event is a money laundering transaction.

把训练样本的多个风险特征构成的集合记为S，|S|＝d，给定D的分类模型f(D)，希望通过该分类模型找到至少部分风险特征构成的集合对应的确定性报文记为M(S')，使得M(S')的长度不大于给定的阈值λ-θ，也即：|M(S')|≤λ-θ，其中，λ为确定性报文与不确定性报文总的约束长度，θ为不确定性报文的约束长度，则λ-θ为确定性报文的约束长度(也即，上述的预定的报文长度约束条件)。各约束长度通常根据实际情况(比如，审理人员不同、环境不同等)预先设定。Let S be the set of multiple risk features of the training sample, where |S| = d. Given a classification model f(D) for D, we hope to find a deterministic message corresponding to at least some of the risk features through this classification model, denoted as M(S'), such that the length of M(S') is no greater than a given threshold λ-θ, that is, |M(S')| ≤ λ-θ, where λ is the total constraint length of the deterministic message and the uncertain message, θ is the constraint length of the uncertain message, and λ-θ is the constraint length of the deterministic message (that is, the predetermined message length constraint condition mentioned above). Each constraint length is usually pre-set based on the actual situation (for example, different reviewers, different environments, etc.).

理想的目标是筛选出一个最优的特征集合使得S^*对应的数据集在分类器f(D(S*))下的AUC结果AUC(D^,S',f)最大，也就是求解如下的组合优化问题：The ideal goal is to select an optimal feature set so that the AUC result AUC(D ^, S',f) of the data set corresponding to S ^* under the classifier f(D(S*)) is maximized, that is, to solve the following combinatorial optimization problem:

S^*＝argmax_|S'|AUC(D,S',f)；S ^* =argmax _|S'| AUC(D,S',f);

s.t.:|M(S')|≤λ-θ；s.t.:|M(S')|≤λ-θ;

其中，目标函数AUC(D,S',f)表示每次按某种方案选取特征子集S'后，D在分类器f(X)下的下的AUC。Among them, the objective function AUC(D,S',f) represents the AUC of D under the classifier f(X) after each feature subset S' is selected according to a certain scheme.

当然，根据前面的分析可知，要达到这种理想的目标成本较高，因此，退而求其次，利用贪心搜索策略近似求解。图5为本说明书实施例中提供的一种自动报文算法的示意图，即反映了该近似求解过程。Of course, according to the previous analysis, it is costly to achieve this ideal goal. Therefore, as a second best option, a greedy search strategy is used to approximate the solution. FIG5 is a schematic diagram of an automatic message algorithm provided in an embodiment of this specification, which reflects the approximate solution process.

在图5中，特征权重倒排表即为上述的第二排序结果，S'即为上述的设定集合，步骤3即为上述的遍历筛选风险特征的过程。需要说明的是，图5中是一边筛选风险特征，一边生成子报文的，风险特征筛选完毕时，即已经得到构成确定性报文的各子报文。In Figure 5, the feature weight inverted list represents the second sorting result, S' represents the aforementioned set of settings, and step 3 represents the aforementioned process of traversing and screening risk features. It should be noted that in Figure 5, risk features are screened while sub-messages are generated. Upon completion of risk feature screening, the sub-messages that constitute the deterministic message are obtained.

进一步地，本说明书实施例还提供了一种实际应用场景下的可疑交易甄别流程示意图，如图6所示。Furthermore, the embodiments of this specification also provide a schematic diagram of a suspicious transaction identification process in an actual application scenario, as shown in FIG6 .

图6中的流程主要包括：基于可疑规则生成描述报文生成任务，其中，该任务是针对疑似洗钱交易的；进一步地，可以利用本说明书的方案自动执行该任务(也即，为疑似洗钱交易生成描述报文)；再针对该描述报文进行人工初审以及人工复审。The process in Figure 6 mainly includes: generating a descriptive message generation task based on suspicious rules, wherein the task is for suspected money laundering transactions; further, the solution of this specification can be used to automatically execute the task (that is, generate a descriptive message for suspected money laundering transactions); and then manually conduct an initial review and a manual review of the descriptive message.

基于同样的思路，本说明书实施例还提供了对应的装置，如图7、图8所示。Based on the same idea, the embodiments of this specification also provide corresponding devices, as shown in Figures 7 and 8.

图7为本说明书实施例提供的对应于图2的一种风险特征筛选装置的结构示意图，包括：FIG7 is a schematic diagram of the structure of a risk feature screening device corresponding to FIG2 provided in an embodiment of this specification, including:

获取模块701，获取多个风险特征分别的特征权重，所述特征权重根据利用样本事件训练得到的分类模型得到或者预定义得到，所述分类模型用于判定风险事件；An acquisition module 701 acquires feature weights of a plurality of risk features, wherein the feature weights are obtained based on a classification model trained using sample events or are predefined, and the classification model is used to determine risk events;

筛选模块702，根据所述特征权重和预定条件，筛选出至少部分风险特征，所述预定条件用于约束根据风险特征所生成报文的长度。The screening module 702 screens out at least some risk features according to the feature weights and predetermined conditions, where the predetermined conditions are used to constrain the length of messages generated according to the risk features.

可选地，所述装置还包括权重确定模块703；Optionally, the apparatus further includes a weight determination module 703;

所述权重确定模块703根据利用样本事件训练得到的分类模型得到所述特征权重，具体包括：The weight determination module 703 obtains the feature weight according to the classification model trained using the sample events, specifically including:

所述权重确定模块703利用样本事件训练得到分类模型；The weight determination module 703 obtains a classification model by training sample events;

分别针对所述多个风险特征执行：Execute for each of the risk features:

获取所述样本事件中对应于该风险特征的数据；Obtaining data corresponding to the risk feature in the sample event;

根据所述对应于该风险特征的数据，计算该风险特征对应于所述分类According to the data corresponding to the risk feature, the risk feature corresponding to the classification is calculated.

模型的分类准确性度量指标；Metrics for the model’s classification accuracy;

根据该分类准确性度量指标，得到该风险特征的特征权重。According to the classification accuracy measurement index, the feature weight of the risk feature is obtained.

可选地，所述多个风险特征分别有对应的子报文字数；所述筛选模块702根据所述特征权重和预定条件，筛选出至少部分风险特征，具体包括：Optionally, the plurality of risk features each have a corresponding number of sub-report characters; the screening module 702 screens out at least some of the risk features based on the feature weights and predetermined conditions, specifically including:

所述筛选模块702根据所述特征权重及对应的所述子报文字数，对所述多个风险特征进行第一排序；The screening module 702 performs a first sorting of the plurality of risk features according to the feature weights and the corresponding number of sub-report characters;

根据所述第一排序结果、所述子报文字数，以及预定条件，筛选出至少部分风险特征。At least some risk features are screened out based on the first sorting result, the number of characters in the sub-report, and predetermined conditions.

可选地，所述筛选模块702根据所述特征权重及对应的所述子报文字数，对所述多个风险特征进行第一排序，具体包括：Optionally, the screening module 702 performs a first sorting on the plurality of risk features according to the feature weights and the corresponding number of sub-report characters, specifically including:

所述筛选模块702确定所述多个风险特征按照所述特征权重大小，进行第二排序得到的第二排序结果；The screening module 702 determines a second ranking result obtained by performing a second ranking on the plurality of risk features according to the feature weights;

根据所述第二排序结果，选取所述多个风险特征中的至少部分风险特征；selecting at least some of the risk features according to the second ranking result;

根据所述特征权重及对应的所述子报文字数，对所述选取的风险特征进行第一排序。The selected risk features are first sorted according to the feature weights and the corresponding number of sub-report characters.

所述筛选模块702根据所述风险特征对应的所述特征权重和所述子报文字数，计算所述风险特征对应的单位字数权重；The screening module 702 calculates the unit word count weight corresponding to the risk feature according to the feature weight corresponding to the risk feature and the number of words in the sub-report;

按照所述单位字数权重，对所述多个风险特征进行第一排序。The plurality of risk features are first sorted according to the unit word count weight.

可选地，所述筛选模块702根据所述第一排序结果、所述子报文字数，以及预定条件，筛选出至少部分风险特征，具体包括：Optionally, the screening module 702 screens out at least some risk features based on the first sorting result, the number of sub-report characters, and a predetermined condition, specifically including:

所述筛选模块702根据所述第一排序结果，针对所述第一排序结果包含的各风险特征，按照单位字数权重从大到小的顺序，进行遍历，针对当前风险特征执行：The screening module 702 traverses the risk features included in the first sorting result in descending order of unit word weight based on the first sorting result, and performs the following operations on the current risk feature:

可选地，所述筛选模块702遍历至下一个风险特征，具体包括：Optionally, the screening module 702 traverses to the next risk feature, specifically including:

所述筛选模块702确定所述设定集合对应于所述分类模型的分类准确性度量指标；The screening module 702 determines that the setting set corresponds to a classification accuracy metric of the classification model;

可选地，所述分类准确性度量指标包括受试者工作特征曲线线下面积AUC。Optionally, the classification accuracy metric includes the area under the receiver operating characteristic curve (AUC).

可选地，所述装置还包括：Optionally, the device further comprises:

报文生成模块704，获取待描述事件；The message generation module 704 obtains the event to be described;

分别针对筛选出至少部分风险特征，生成对应于所述待描述事件的子报文，Generating sub-messages corresponding to the event to be described for each of the at least some risk features screened out,

根据各所述子报文，为所述待描述事件生成描述报文。A description message is generated for the event to be described according to each of the sub-messages.

可选地，所述待描述事件被所述分类模型判定为风险事件，所述风险事件为疑似洗钱交易。Optionally, the event to be described is determined by the classification model to be a risk event, and the risk event is a suspected money laundering transaction.

图8为本说明书实施例提供的对应于图3的一种描述报文生成装置的结构示意图，包括：FIG8 is a schematic structural diagram of a description message generating device corresponding to FIG3 provided in an embodiment of this specification, including:

获取模块801，获取待描述事件；Acquisition module 801, acquires the event to be described;

确定模块802，确定筛选出的各风险特征；Determination module 802, determining each risk feature screened out;

生成模块803，根据所述筛选出的各风险特征，为所述待描述事件生成描述报文；A generating module 803 generates a description message for the event to be described based on the screened risk features;

基于同样的思路，本说明书实施例还提供了一种电子设备，包括：Based on the same idea, an embodiment of this specification further provides an electronic device, including:

至少一个处理器；以及，at least one processor; and,

基于同样的思路，本说明书实施例还提供了另一种电子设备，包括：Based on the same idea, the embodiments of this specification also provide another electronic device, including:

至少一个处理器；以及，at least one processor; and,

获取待描述事件；Get the event to be described;

基于同样的思路，本说明书实施例还提供了一种非易失性计算机存储介质，存储有计算机可执行指令，所述计算机可执行指令设置为：Based on the same idea, an embodiment of this specification further provides a non-volatile computer storage medium storing computer-executable instructions, wherein the computer-executable instructions are configured as follows:

基于同样的思路，本说明书实施例还提供了另一种非易失性计算机存储介质，存储有计算机可执行指令，所述计算机可执行指令设置为：Based on the same idea, the embodiments of this specification also provide another non-volatile computer storage medium storing computer-executable instructions, wherein the computer-executable instructions are configured as follows:

获取待描述事件；Get the event to be described;

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The foregoing description of this specification describes specific embodiments. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that described in the embodiments and still achieve the desired results. Furthermore, the processes depicted in the accompanying drawings do not necessarily require the specific order shown or the sequential order to achieve the desired results. In certain embodiments, multitasking and parallel processing are also possible or may be advantageous.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于装置、电子设备、非易失性计算机存储介质实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。The various embodiments in this specification are described in a progressive manner. Similar portions between the various embodiments can be referenced to each other, and each embodiment focuses on the differences between the other embodiments. In particular, the device, electronic device, and non-volatile computer storage medium embodiments are generally similar to the method embodiments, so their descriptions are relatively simplified. For relevant details, refer to the descriptions of the method embodiments.

本说明书实施例提供的装置、电子设备、非易失性计算机存储介质与方法是对应的，因此，装置、电子设备、非易失性计算机存储介质也具有与对应方法类似的有益技术效果，由于上面已经对方法的有益技术效果进行了详细说明，因此，这里不再赘述对应装置、电子设备、非易失性计算机存储介质的有益技术效果。The apparatus, electronic device, and non-volatile computer storage medium provided in the embodiments of this specification correspond to the method. Therefore, the apparatus, electronic device, and non-volatile computer storage medium also have similar beneficial technical effects as the corresponding method. Since the beneficial technical effects of the method have been described in detail above, the beneficial technical effects of the corresponding apparatus, electronic device, and non-volatile computer storage medium will not be repeated here.

在20世纪90年代，对于一个技术的改进可以很明显地区分是硬件上的改进(例如，对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而，随着技术的发展，当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此，不能说一个方法流程的改进就不能用硬件实体模块来实现。例如，可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable GateArray，FPGA))就是这样一种集成电路，其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上，而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且，如今，取代手工地制作集成电路芯片，这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现，它与程序开发撰写时所用的软件编译器相类似，而要编译之前的原始代码也得用特定的编程语言来撰写，此称之为硬件描述语言(Hardware Description Language，HDL)，而HDL也并非仅有一种，而是有许多种，如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware DescriptionLanguage)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(RubyHardware Description Language)等，目前最普遍使用的是VHDL(Very-High-SpeedIntegrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚，只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中，就可以很容易得到实现该逻辑方法流程的硬件电路。In the 1990s, technological improvements could be clearly distinguished as either hardware improvements (for example, improvements to circuit structures like diodes, transistors, and switches) or software improvements (improvements to process flows). However, with the advancement of technology, many process flow improvements today can now be considered direct improvements to hardware circuit structures. Designers almost always create the corresponding hardware circuit structure by programming the improved process flow into the hardware circuit. Therefore, it cannot be said that a process flow improvement cannot be implemented using hardware modules. For example, a programmable logic device (PLD), such as a field programmable gate array (FPGA), is an integrated circuit whose logical function is determined by user programming. Designers can "integrate" a digital system on a PLD through their own programming, without having to hire a chip manufacturer to design and manufacture a dedicated integrated circuit chip. Moreover, nowadays, instead of manually fabricating integrated circuit chips, this programming is mostly done using "logic compiler" software. This is similar to the software compiler used when developing programs. Before compilation, the original code must also be written in a specific programming language, called a hardware description language (HDL). There is not just one HDL, but many, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc. The most commonly used ones are VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. Those skilled in the art will also understand that by simply programming the method flow in one of these hardware description languages and then programming it into an integrated circuit, a hardware circuit that implements the logic method flow can be easily obtained.

控制器可以按任何适当的方式实现，例如，控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit，ASIC)、可编程逻辑控制器和嵌入微控制器的形式，控制器的例子包括但不限于以下微控制器：ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320，存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道，除了以纯计算机可读程序代码方式实现控制器以外，完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件，而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至，可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller can be implemented in any suitable manner. For example, the controller can take the form of a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro)processor, logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers. Examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320. The memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art will also know that in addition to implementing the controller in a purely computer-readable program code format, the controller can be implemented in the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers by logically programming the method steps. Therefore, such a controller can be considered a hardware component, and the devices included therein for implementing various functions can also be considered as structures within the hardware component. Or even, the devices for implementing various functions can be considered as both software modules that implement the method and structures within the hardware component.

上述实施例阐明的系统、装置、模块或单元，具体可以由计算机芯片或实体实现，或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的，计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules, or units described in the above embodiments may be implemented by computer chips or entities, or by products having certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

为了描述的方便，描述以上装置时以功能分为各种单元分别描述。当然，在实施本说明书一个或多个实施例时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, the above devices are described as being divided into various units according to their functions. Of course, when implementing one or more embodiments of this specification, the functions of each unit can be implemented in the same or multiple software and/or hardware.

本领域内的技术人员应明白，本说明书实施例可提供为方法、系统、或计算机程序产品。因此，本说明书实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本说明书实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that the embodiments of this specification may be provided as methods, systems, or computer program products. Therefore, the embodiments of this specification may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Furthermore, the embodiments of this specification may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to magnetic disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

本说明书是参照根据本说明书实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This specification is described with reference to the flowcharts and/or block diagrams of the methods, devices (systems), and computer program products according to the embodiments of this specification. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the processes and/or boxes in the flowchart and/or block diagram, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to operate in a specific manner, so that the instructions stored in the computer-readable memory produce a product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device so that a series of operating steps are executed on the computer or other programmable device to produce a computer-implemented process, so that the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

在一个典型的配置中，计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器，随机存取存储器(RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in a computer-readable medium, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Computer-readable media includes permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information. The information can be computer-readable instructions, data structures, program modules or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media (transitory media), such as modulated data signals and carrier waves.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprises," "includes," or any other variations thereof are intended to encompass non-exclusive inclusion, such that a process, method, commodity, or apparatus that includes a series of elements includes not only those elements but also other elements not explicitly listed, or includes elements inherent to such process, method, commodity, or apparatus. In the absence of further limitations, an element defined by the phrase "comprises a ..." does not exclude the presence of other identical elements in the process, method, commodity, or apparatus that includes the element.

本说明书可以在由计算机执行的计算机可执行指令的一般上下文中描述，例如程序模块。一般地，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践说明书，在这些分布式计算环境中，由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中，程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。This specification may be described in the general context of computer-executable instructions, such as program modules, executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and the like that perform specific tasks or implement specific abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media, including storage devices.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于系统实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。The various embodiments in this specification are described in a progressive manner. Similar parts between the various embodiments can be referred to in conjunction with each other. Each embodiment focuses on the differences between the other embodiments. In particular, the system embodiments are generally similar to the method embodiments, so the description is relatively simple. For relevant parts, refer to the description of the method embodiments.

以上所述仅为本说明书实施例而已，并不用于限制本申请。对于本领域技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本申请的权利要求范围之内。The foregoing is merely an embodiment of the present invention and is not intended to limit the present application. For those skilled in the art, various modifications and variations may be made to the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application should be included within the scope of the claims of the present application.

Claims

1. A risk characteristic screening method, comprising:

Obtain feature weights for multiple risk features. The feature weights are obtained based on a classification model trained using sample events or are predefined. The classification model is used to determine risk events.

Based on the feature weights and predetermined conditions, at least some risk features are selected, wherein the predetermined conditions are used to constrain the length of the message generated based on the risk features;

The plurality of risk features each have a corresponding number of sub-report characters; the step of selecting at least a subset of risk features based on the feature weights and predetermined conditions specifically includes:

The multiple risk features are sorted in a first order based on the feature weights and the corresponding number of words in the sub-report;

Based on the first sorting result, the number of words in the sub-report, and predetermined conditions, at least some risk characteristics are selected.

2. The method as described in claim 1, wherein the feature weights are obtained based on a classification model trained using sample events, specifically includes:

A classification model is obtained by training sample events;

Perform the following actions for each of the aforementioned risk characteristics:

Obtain the data corresponding to this risk characteristic from the sample events;

Based on the data corresponding to the risk feature, calculate the classification accuracy metric of the classification model corresponding to the risk feature;

Based on the classification accuracy metric, the feature weight of this risk characteristic is obtained.

3. The method as described in claim 1, wherein the first sorting of the plurality of risk features according to the feature weights and the corresponding sub-report word counts specifically includes:

The second ranking result is obtained by ranking the multiple risk features according to their weights.

Based on the second sorting result, at least some of the risk features among the plurality of risk features are selected;

The selected risk features are sorted in a first order based on the feature weights and the corresponding number of words in the sub-report.

4. The method as described in claim 1, wherein the first sorting of the plurality of risk features according to the feature weights and the corresponding sub-report word counts specifically includes:

Calculate the unit word weight corresponding to the risk feature based on the feature weight corresponding to the risk feature and the word count of the sub-report;

The multiple risk features are sorted in the first order according to the weight of the number of words per unit.

5. The method as described in claim 1, wherein filtering out at least some risk characteristics based on the first sorting result, the number of words in the sub-report, and predetermined conditions specifically includes:

Based on the first sorting result, for each risk feature included in the first sorting result, traverse the list in descending order of weight per unit word count, and execute the following for the current risk feature:

The current risk feature is added to the set. It is then determined whether the sum of the word counts of the sub-reports corresponding to the risk features in the set meets the predetermined conditions. If so, the process proceeds to the next risk feature. Otherwise, the current risk feature is removed from the set, the process ends, and the risk features in the set are used as at least a portion of the selected risk features. The set is initially an empty set.

6. The method of claim 5, wherein traversing to the next risk feature specifically includes:

Determine the classification accuracy metric of the classification model corresponding to the set of parameters;

Determine whether the classification accuracy metric is not greater than the classification accuracy metric of the set corresponding to the classification model before the current risk feature was added; if so, remove the current risk feature from the set and proceed to the next risk feature; otherwise, proceed to the next risk feature.

7. The method of claim 2 or 6, wherein the classification accuracy metric includes the area under the receiver operating characteristic curve (AUC).

8. The method according to any one of claims 1 to 6, wherein the method further comprises:

Get the event to be described;

For each selected risk feature, a sub-message corresponding to the event to be described is generated. Based on each sub-message, a description message is generated for the event to be described.

9. The method of claim 8, wherein the event to be described is determined by the classification model to be a risk event, and the risk event is a suspected money laundering transaction.

10. A method for describing message generation, comprising:

Get the event to be described;

Determine the risk characteristics selected from the screening;

Based on the selected risk characteristics, a description message is generated for the event to be described;

The step of selecting each risk feature includes: obtaining the feature weights of multiple risk features respectively, and selecting each risk feature according to the feature weights and predetermined conditions. The feature weights are obtained based on a classification model trained using sample events or are predefined. The classification model is used to determine risk events, and the predetermined conditions are used to constrain the length of the message generated based on the risk features.

Each of the multiple risk features has a corresponding sub-report word count; the process of filtering out each risk feature based on the feature weight and predetermined conditions specifically includes:

11. A risk characteristic screening device, comprising:

The acquisition module acquires the feature weights of multiple risk features. The feature weights are obtained based on a classification model trained using sample events or are predefined. The classification model is used to determine risk events.

The filtering module filters out at least some risk features based on the feature weights and predetermined conditions, wherein the predetermined conditions are used to constrain the length of the message generated based on the risk features;

The plurality of risk features each have a corresponding number of sub-report characters; the filtering module filters out at least a portion of the risk features based on the feature weights and predetermined conditions, specifically including:

The filtering module performs a first sorting of the multiple risk features based on the feature weights and the corresponding number of words in the sub-report;

12. The apparatus of claim 11, further comprising a weight determination module;

The weight determination module obtains the feature weights based on the classification model trained using sample events, specifically including:

The weight determination module uses sample events to train a classification model;

13. The apparatus of claim 11, wherein the screening module performs a first sorting of the plurality of risk features according to the feature weights and the corresponding sub-report word counts, specifically including:

The screening module determines the second sorting result obtained by sorting the multiple risk features according to their weights.

14. The apparatus of claim 11, wherein the screening module performs a first sorting of the plurality of risk features according to the feature weights and the corresponding sub-report word counts, specifically including:

The filtering module calculates the unit word weight corresponding to the risk feature based on the feature weight corresponding to the risk feature and the word count of the sub-report;

15. The apparatus of claim 11, wherein the filtering module filters out at least some risk characteristics based on the first sorting result, the number of characters in the sub-report, and predetermined conditions, specifically including:

The filtering module, based on the first sorting result, iterates through each risk feature included in the first sorting result in descending order of weight per unit character, and executes the following for the current risk feature:

16. The apparatus of claim 15, wherein the screening module iterates to the next risk feature, specifically including:

The filtering module determines the classification accuracy metric of the classification model corresponding to the set;

17. The apparatus of claim 12 or 16, wherein the classification accuracy metric includes the area under the receiver operating characteristic curve (AUC).

18. The apparatus according to any one of claims 11 to 16, wherein the apparatus further comprises:

The message generation module obtains the event to be described;

For each selected risk feature, a sub-message corresponding to the event to be described is generated.

Based on each of the sub-messages, a description message is generated for the event to be described.

19. The apparatus of claim 18, wherein the event to be described is determined by the classification model to be a risk event, and the risk event is a suspected money laundering transaction.

20. A message generation apparatus, comprising:

Get the module to retrieve the event to be described;

The module identifies the selected risk characteristics.

The generation module generates a description message for the event to be described based on the selected risk characteristics.

21. An electronic device for risk characteristic screening, comprising:

At least one processor; and,

A memory communicatively connected to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, which, when executed by the at least one processor, enable the at least one processor to:

22. An electronic device for describing message generation, comprising:

At least one processor; and,

A memory communicatively connected to the at least one processor; wherein,

The memory stores instructions that can be executed by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to:

Get the event to be described;

Determine the risk characteristics selected from the screening;