WO2019144311A1

WO2019144311A1 - Rule embedded artificial neural network system and training method thereof

Info

Publication number: WO2019144311A1
Application number: PCT/CN2018/073945
Authority: WO
Inventors: 王虎
Original assignee: Lohas Technology (beijing) Corp Ltd
Current assignee: Lohas Technology (beijing) Corp Ltd
Priority date: 2018-01-24
Filing date: 2018-01-24
Publication date: 2019-08-01
Anticipated expiration: 2020-07-24

Abstract

Disclosed in the present invention are a rule embedded artificial neural network system and a training method thereof. The system comprises: an input layer, a first intermediate layer, a regularization modulation layer, a second intermediate layer, and an output layer. An output end of the input layer is respectively connected to input ends of the first intermediate layer and the second intermediate layer. A neuron output end of the first intermediate layer is connected to an input end of the regularization modulation layer. The second intermediate layer and a neuron output end of the regularization modulation layer are connected to an input end of the output layer. The invention solves the technical problem that the artificial neural network provided in the related art has high complexity and large demand for computing resources.

Description

Rule embedded artificial neural network system and training method thereof

Technical field

本发明涉及神经网络领域，具体而言，涉及一种规则嵌入式人工神经网络系统及其训练方法。The present invention relates to the field of neural networks, and in particular to a rule embedded artificial neural network system and a training method thereof.

Background technique

目前，人工神经网络技术凭借其卓越的推断性能，已广泛应用于语音识别、图像识别、视频分析、自然语言处理等问题，极大地提高了人们的生产效率和生活质量。At present, artificial neural network technology has been widely used in speech recognition, image recognition, video analysis, natural language processing and other issues with its superior inferential performance, which greatly improves people's production efficiency and quality of life.

然而，人工神经网络主要从大量数据中学习统计规律，因而其性能容易受到数据量的制约。为了实现复杂模式的准确识别，相关技术中通常使用更宽、更深的神经网络，为此需要更多的计算资源。这使得计算能力较弱的计算设备(尤其是嵌入式处理平台)难以满足既定应用场景所需的准确性和实时性。However, artificial neural networks mainly learn statistical laws from a large amount of data, so their performance is easily restricted by the amount of data. In order to achieve accurate identification of complex patterns, a wider and deeper neural network is commonly used in the related art, and more computing resources are needed for this. This makes computing devices with weak computing power (especially embedded processing platforms) difficult to meet the accuracy and real-time requirements of a given application scenario.

另一方面，现有的人工神经网络通常使用端到端的推断方法，该推断过程难以通过人类的知识逻辑进行解释。同时，这也使得人工神经网络的推断受到某些噪声影响而出现错误。人工神经网络的错误推断越多，适用范围就越小，对生产、生活的价值便越小。这使得利用人类的知识逻辑进行自动纠错成为必要。On the other hand, existing artificial neural networks usually use end-to-end inference methods, which are difficult to interpret through human knowledge logic. At the same time, this also causes the inference of the artificial neural network to be affected by some noise and errors. The more erroneous inferences made by artificial neural networks, the smaller the scope of application and the lower the value of production and life. This makes it necessary to use human knowledge logic for automatic error correction.

实际上，人类的知识逻辑可以描述为计算机可处理的规则，进而对部分错误推断进行自动纠错。利用规则进行自动纠错可以分为预处理和后处理两种方式。In fact, human knowledge logic can be described as computer-processable rules, which in turn automatically correct errors in partial error inference. Automatic correction using rules can be divided into pre-processing and post-processing.

预处理纠错方式使用规则对输入数据进行处理。例如：专利申请号为CN201710239556.X的中国申请中提供了如下技术方案：首先，通过使用现有的知识库对输入数据进行预处理，生成嵌入一阶逻辑规则的三元组(规则化数据)，每个三元组包含输入数据中的两个词语及其逻辑规则。然后，再使用三元组作为输入数据交由神经网络进行模型训练和推断。The pre-processing error correction method uses rules to process the input data. For example, the following technical solution is provided in the Chinese application with the patent application number CN201710239556.X: First, the input data is preprocessed by using the existing knowledge base to generate a triplet (regularized data) embedded with the first-order logic rule. Each triple contains two words in the input data and their logical rules. Then, using the triples as input data, the neural network is used for model training and inference.

这种方式虽然结合了输入信号之间的关联逻辑，进而能够在一定程度地减少错误推断。然而，这种方式的缺陷在于：由于使用的知识库独立于输入数据的整体，因此所生成的规则化数据通常只能达到对局部数据(词语)的理解，但是却达不到对数据全局的建模和理解，从而导致其纠错能力有限。Although this method combines the correlation logic between the input signals, it can reduce the erroneous inference to some extent. However, the drawback of this approach is that since the knowledge base used is independent of the whole of the input data, the generated regularized data usually only achieves the understanding of local data (words), but it does not reach the global data. Modeling and understanding, resulting in limited error correction capabilities.

在后处理纠错方式中，神经网络的输出并非最终推断结果，而是作为输入特征进一步开展规则化调制，最终进行推断。例如，在专利申请号为CN 201710083292.3的中国申请提供了一种实现方案，其通过人工神经网络的识别出设备区域并结合设备特性设计温度预警规则。当满足温度预警规则时，才发出报警。In the post-processing error correction mode, the output of the neural network is not the final inference result, but is further refined as an input feature, and finally inferred. For example, the Chinese application with the patent application number CN 201710083292.3 provides an implementation scheme for identifying a device area by an artificial neural network and designing a temperature warning rule in combination with device characteristics. An alarm is issued when the temperature warning rule is met.

这种方式虽然同时考虑了设备特性和温度，从而减少错误预警的可能，但是其问题在于：输出结果对规则较为敏感，不适合需要人工神经网络端到端输出的场景。Although this method considers the device characteristics and temperature at the same time, thus reducing the possibility of false alarms, the problem is that the output results are sensitive to rules and are not suitable for scenarios requiring end-to-end output of artificial neural networks.

此外，在专利申请号为US20170300813的美国申请提出一种用于连接不同功能的人工神经网络的数据转换接口技术。该专利申请中提到的转换规则是数据驱动的，而未包含基于人类的知识逻辑的自动纠错规则。在专利申请号为US 20090324010的美国申请中虽然提出对人工神经网络的输出数据进行规则化调制，但是输出结果对规则较为敏感，难以适用于需要神经网络直接输出的场景。在专利申请号为US 5740322的美国申请中提到的模糊规则(Fuzzy rules)是依据训练集优化而得到的，但是并未包含基于人类的知识逻辑的自动纠错规则。In addition, a data conversion interface technique for connecting artificial neural networks of different functions is proposed in the U.S. Patent Application Serial No. US20170300813. The conversion rules mentioned in this patent application are data driven and do not contain automatic error correction rules based on human knowledge logic. Although the US application filed in US Patent Application No. US 20090324010 proposes regular modulation of the output data of the artificial neural network, the output result is sensitive to the rules and is difficult to apply to scenarios requiring direct output by the neural network. The fuzzy rules mentioned in the U.S. application Serial No. 5,740,322 are based on training set optimization, but do not include automatic error correction rules based on human knowledge logic.

针对上述的问题，目前尚未提出有效的解决方案。In response to the above problems, no effective solution has been proposed yet.

发明内容Summary of the invention

本发明至少部分实施例提供了一种规则嵌入式人工神经网络系统及其训练方法，以至少解决相关技术中所提供的人工神经网络的复杂度较高、计算资源需求量较大的技术问题。At least some embodiments of the present invention provide a rule embedded artificial neural network system and a training method thereof, so as to at least solve the technical problem that the artificial neural network provided in the related art has high complexity and large computational resource requirement.

根据本发明其中一实施例，提供了一种规则嵌入式人工神经网络系统，包括：According to one embodiment of the present invention, a rule embedded artificial neural network system is provided, including:

输入层、第一中间层、规则化调制层、第二中间层以及输出层，其中，输入层的输出端分别与第一中间层和第二中间层的输入端相连接，第一中间层的神经元输出端与规则化调制层的输入端相连接，第二中间层与规则化调制层的神经元输出端与输出层的输入端相连接。An input layer, a first intermediate layer, a regularization modulation layer, a second intermediate layer, and an output layer, wherein an output end of the input layer is respectively connected to an input end of the first intermediate layer and the second intermediate layer, the first intermediate layer The neuron output is coupled to the input of the regularized modulation layer, and the second intermediate layer is coupled to the input of the output layer of the neuron output of the regularized modulation layer.

可选地，第一中间层与第二中间层分别包括一层或者多层神经网络。Optionally, the first intermediate layer and the second intermediate layer respectively comprise one or more layers of neural networks.

可选地，上述系统还包括：第三中间层，其中，第二中间层和规则化调制层的神经元输出端经过合并后经由第三中间层与输出层的输入端相连接；或者，取消第二中间层，输入层和规则化调制层的神经元输出端经过合并后经由第三中间层与输出层的输入端相连接。Optionally, the system further includes: a third intermediate layer, wherein the neuron output ends of the second intermediate layer and the regularization modulation layer are combined and connected to the input end of the output layer via the third intermediate layer; or, cancel The second intermediate layer, the input layer and the neuron output of the regularization modulation layer are combined and connected to the input end of the output layer via the third intermediate layer.

可选地，规则化调制层包括：结构化建模和数据转换，其中，结构化建模是指基于人类的专业知识逻辑提取输入数据中的全局结构化特征规律，数据转换是指利用全局结构化特征规律对输入数据中的样本进行分析并转换为神经元输出，以使输入数据中符合全局结构化特征规律的样本对应的神经元产生激活输出以及其余神经元产生非激活输出。Optionally, the regularized modulation layer comprises: structured modeling and data conversion, wherein the structured modeling refers to extracting global structured feature rules in the input data based on human professional knowledge logic, and the data conversion refers to utilizing the global structure The feature rule analyzes the samples in the input data and converts them into neuron outputs, so that the neurons corresponding to the samples in the input data that conform to the global structured feature law generate an activation output and the remaining neurons generate an inactive output.

根据本发明其中一实施例，还提供了一种规则嵌入式人工神经网络的训练方法，应用于上述规则嵌入式人工神经网络系统，包括：According to an embodiment of the present invention, a training method for a rule-embedded artificial neural network is further provided, which is applied to the above-mentioned rule embedded artificial neural network system, including:

构建训练数据及对应标签；构建第一辅助层，其中，第一辅助层的神经元结构与输出层的神经元结构相同，第一辅助层的输入端与第一中间层的神经元输出端相连接；构建第一优化器，并使用第一优化器迭代并优化由输入层、第一中间层以及第一辅助层构成网络的连接权值，其中，第一优化器的损失函数由给定训练数据时，第一辅助层的神经元输出值与对应标签进行比较获得；构建第二优化器，其中，第二优化器的损失函数由给定训练数据时，输出层的神经元输出值与对应标签进行比较获得；固定输入层与第一中间层所构成网络的连接权值，并使用第二优化器迭代优化由输入层、第二中间层、规则化调制层以及输出层构成网络的连接权值。Constructing training data and corresponding labels; constructing a first auxiliary layer, wherein a neuron structure of the first auxiliary layer is the same as a neuron structure of the output layer, and an input end of the first auxiliary layer is opposite to a neuron output end of the first intermediate layer Connecting; constructing a first optimizer and iterating and optimizing a connection weight of the network formed by the input layer, the first intermediate layer, and the first auxiliary layer using the first optimizer, wherein the loss function of the first optimizer is given by a given training In the data, the neuron output value of the first auxiliary layer is obtained by comparing with the corresponding label; and a second optimizer is constructed, wherein when the loss function of the second optimizer is given training data, the output value of the output layer is corresponding to the output value of the neuron The label is obtained by comparison; the connection weight of the network formed by the input layer and the first intermediate layer is fixed, and the second optimizer is used to iteratively optimize the connection right of the network formed by the input layer, the second intermediate layer, the regularization modulation layer and the output layer value.

可选地，当存在第三中间层时，固定输入层与第一中间层所构成网络的连接权值，并使用第二优化器迭代优化由输入层、第二中间层、规则化调制层、第三中间层以及输出层构成网络的连接权值。Optionally, when there is a third intermediate layer, the connection weight of the network formed by the input layer and the first intermediate layer is fixed, and the input layer, the second intermediate layer, the regularization modulation layer, and the iterative optimization are performed by using the second optimizer. The third intermediate layer and the output layer constitute a connection weight of the network.

可选地，当存在第三中间层时，将规则化调制层、第二中间层、第三中间层设置为一个神经网络组成单元，并在输入层与输出层之间以及第一中间层与输出层之间构造大于一个神经网络组成单元，其中，大于一个神经网络组成单元首尾相连。Optionally, when the third intermediate layer is present, the regularized modulation layer, the second intermediate layer, and the third intermediate layer are set as a neural network component unit, and between the input layer and the output layer and the first intermediate layer The structure between the output layers is larger than one neural network component, wherein more than one neural network component is connected end to end.

可选地，当存在大于一个神经网络组成单元时，在通过训练优化每个神经网络组成单元的参数时，构建第二辅助层，其中，第二辅助层的输入端与当前神经网络组成单元的第三中间层的输出端相连接，第二辅助层的输出形状与输出层相同。Optionally, when there is more than one neural network component unit, when optimizing parameters of each neural network component unit by training, constructing a second auxiliary layer, wherein the input end of the second auxiliary layer and the current neural network component unit The output ends of the third intermediate layer are connected, and the output shape of the second auxiliary layer is the same as that of the output layer.

可选地，当存在大于一个神经网络组成单元时，针对每个神经网络组成单元分别构建一个单独的优化器，其中，每个构建的优化器的损失函数由给定训练数据时第一辅助层的神经元输出值与对应标签进行比较获得。Optionally, when there is more than one neural network component unit, a separate optimizer is separately constructed for each neural network component unit, wherein the loss function of each constructed optimizer is given by the first auxiliary layer when the training data is given. The neuron output values are compared to the corresponding labels.

可选地，当存在大于一个神经网络组成单元时，采用构建的多个单独的优化器逐层迭代优化每个神经网络组成单元内部各层神经网络的连接权值。Optionally, when there is more than one neural network component unit, the connection weights of each layer of the neural network in each neural network component unit are optimized step by layer using a plurality of separate optimizers constructed.

可选地，构建的多个单独的优化器采用基于梯度下降的逐层贪婪寻优算法；每个构建的优化器的损失函数是表征输出值与目标值差异程度的可导函数，其中，可导函数包括：交叉熵距离度量函数，均方差距离度量函数。Optionally, the plurality of separate optimizers constructed adopt a gradient-based layer-by-layer greedy optimization algorithm; the loss function of each constructed optimizer is a derivable function that characterizes the difference between the output value and the target value, wherein The derivative functions include: a cross entropy distance metric function, and a mean squared distance metric function.

在本发明至少部分实施例中，采用输入层、第一中间层、规则化调制层、第二中间层以及输出层构建规则嵌入式人工神经网络系统的方式，通过输入层的输出端分别与第一中间层和第二中间层的输入端相连接，第一中间层的神经元输出端与规则化调制层的输入端相连接，第二中间层与规则化调制层的神经元输出端与输出层的输入端相连接，达到了简化人工神经网络架构的目的，从而实现了降低人工神经网络的复杂度以及对计算资源的需求的技术效果，进而解决了相关技术中所提供的人工神经网络的复杂度较高、计算资源需求量较大的技术问题。In at least some embodiments of the present invention, the input layer, the first intermediate layer, the regularization modulation layer, the second intermediate layer, and the output layer are used to construct a rule embedded artificial neural network system, and the output ends of the input layer are respectively An intermediate layer is connected to an input end of the second intermediate layer, a neuron output end of the first intermediate layer is connected to an input end of the regularization modulation layer, and a neuron output end and output of the second intermediate layer and the regularization modulation layer are The input ends of the layers are connected to achieve the purpose of simplifying the artificial neural network architecture, thereby realizing the technical effect of reducing the complexity of the artificial neural network and the demand for computing resources, thereby solving the artificial neural network provided by the related art. Technical problems with high complexity and large demand for computing resources.

DRAWINGS

此处所说明的附图用来提供对本发明的进一步理解，构成本申请的一部分，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。在附图中：The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:

图1是根据本发明其中一实施例的规则嵌入式人工神经网络系统的基本结构示意图；1 is a schematic diagram showing the basic structure of a rule-embedded artificial neural network system according to an embodiment of the present invention;

图2是根据本发明其中一优选实施例的规则嵌入式人工神经网络系统的扩展结构示意图；2 is a schematic diagram showing an expanded structure of a rule embedded artificial neural network system according to a preferred embodiment of the present invention;

图3是根据本发明其中一实施例的规则嵌入式人工神经网络的训练方法的流程图；3 is a flow chart of a training method of a rule embedded artificial neural network according to an embodiment of the present invention;

图4是根据本发明其中一优选实施例的全卷积网络的结构示意图；4 is a schematic structural diagram of a full convolution network according to a preferred embodiment of the present invention;

图5是根据本发明其中一优选实施例的二次堆叠的规则嵌入式人工神经网络的结构示意图。5 is a block diagram showing the structure of a regularly stacked artificial neural network of a secondary stack in accordance with one of the preferred embodiments of the present invention.

Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is an embodiment of the invention, but not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts shall fall within the scope of the present invention.

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It is to be understood that the terms "first", "second" and the like in the specification and claims of the present invention are used to distinguish similar objects, and are not necessarily used to describe a particular order or order. It is to be understood that the data so used may be interchanged where appropriate, so that the embodiments of the invention described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "comprises" and "comprises" and "the" and "the" are intended to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that comprises a series of steps or units is not necessarily limited to Those steps or units may include other steps or units not explicitly listed or inherent to such processes, methods, products or devices.

根据本发明实施例，提供了一种规则嵌入式人工神经网络系统的实施例，需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。In accordance with an embodiment of the present invention, an embodiment of a rule-embedded artificial neural network system is provided, it being noted that the steps illustrated in the flowchart of the figures may be performed in a computer system such as a set of computer executable instructions And, although the logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.

图1是根据本发明其中一实施例的规则嵌入式人工神经网络系统的基本结构示意图，如图1所示，该规则嵌入式人工神经网络系统包括：输入层1、第一中间层2、规则化调制层3、第二中间层4以及输出层5，其中，输入层的输出端分别与第一中间层和第二中间层的输入端相连接，第一中间层的神经元输出端与规则化调制层的输入端相连接，第二中间层与规则化调制层的神经元输出端与输出层的输入端相连接。1 is a schematic diagram showing the basic structure of a rule-embedded artificial neural network system according to an embodiment of the present invention. As shown in FIG. 1, the rule embedded artificial neural network system includes: an input layer 1, a first intermediate layer 2, and a rule. a modulation layer 3, a second intermediate layer 4, and an output layer 5, wherein the output ends of the input layers are respectively connected to the input ends of the first intermediate layer and the second intermediate layer, and the neuron output ends and rules of the first intermediate layer The input ends of the modulation layer are connected, and the second intermediate layer and the neuron output of the regular modulation layer are connected to the input end of the output layer.

上述规则化调制层可以包括：结构化建模和数据转换，其中，结构化建模是指基于人类的专业知识逻辑提取输入数据中的全局结构化特征规律，数据转换是指利用全局结构化特征规律对输入数据中的样本进行分析并转换为神经元输出，以使输入数据中符合全局结构化特征规律的样本对应的神经元产生激活输出以及其余神经元产生非激活输出。The above regularized modulation layer may include: structured modeling and data conversion, wherein structured modeling refers to extracting global structured feature rules in input data based on human professional knowledge logic, and data conversion refers to utilizing global structured features The samples in the input data are analyzed and converted into neuron outputs such that the neurons corresponding to the samples in the input data that conform to the global structured feature law produce an activation output and the remaining neurons produce an inactive output.

可选地，上述第一中间层与第二中间层分别包括一层或者多层神经网络。Optionally, the first intermediate layer and the second intermediate layer respectively comprise one or more layers of neural networks.

可选地，图2是根据本发明其中一优选实施例的规则嵌入式人工神经网络系统的扩展结构示意图，如图2所示，该规则嵌入式人工神经网络系统还包括：第三中间层6，其中，第二中间层和规则化调制层的神经元输出端经过合并后经由第三中间层与输出层的输入端相连接。2 is a schematic diagram of an extended structure of a rule-embedded artificial neural network system according to a preferred embodiment of the present invention. As shown in FIG. 2, the rule embedded artificial neural network system further includes: a third intermediate layer 6 And wherein the neuron output ends of the second intermediate layer and the regularization modulation layer are combined and connected to the input end of the output layer via the third intermediate layer.

可选地，当存在第三中间层时，还可以取消第二中间层，并将输入层和规则化调制层的神经元输出端经过合并后经由第三中间层与输出层的输入端相连接。Optionally, when the third intermediate layer is present, the second intermediate layer may also be cancelled, and the input ends and the neuron output ends of the regularized modulation layer are combined and connected to the input end of the output layer via the third intermediate layer. .

相关技术中通常不会将规则置于人工神经网络的中间层，其原因在于：规则产生的结果通常会伴随着数据的细微变化而产生突变。换言之，规则化调制通常是不可微分的过程，而不可微分的过程会破坏人工神经网络的参数优化过程，进而使其无法收敛到极优值，导致参数无法求解。为此，本发明至少部分实施例将纠错过程置于神经网络的中间层，将人类的知识逻辑以规则化调制的形式对神经网络的中间层(又称特征层)进行结构化建模和数据转换，从而达到了如下技术效果：In the related art, the rule is usually not placed in the middle layer of the artificial neural network because the result of the rule is usually accompanied by a slight change in the data to cause a mutation. In other words, regularized modulation is usually a process that cannot be differentiated, and the process of non-differentiation destroys the parameter optimization process of the artificial neural network, so that it cannot converge to a very good value, and the parameters cannot be solved. To this end, at least some embodiments of the present invention place the error correction process in the middle layer of the neural network, and structurally model the intermediate layer (also called the feature layer) of the neural network in the form of regularized modulation of human knowledge logic. Data conversion, thus achieving the following technical effects:

首先，结构化建模是对数据全局的建模和理解，克服了前处理方式的技术缺陷。First, structured modeling is the modeling and understanding of data globally, overcoming the technical flaws of preprocessing.

其次，该网络采用两步训练法对神经网络分别进行训练，克服了规则化调制导致的不可微分问题，也克服了后处理纠错方式对规则敏感的问题。Secondly, the network uses the two-step training method to train the neural network separately, which overcomes the indivisible problem caused by regularized modulation, and overcomes the problem that the post-processing error correction method is sensitive to rules.

再次，由于引入了人类的知识逻辑，因此，不但降低了人工神经网络受噪声影响而犯错的可能性，而且可以减少训练人工神经网络时所需的样本数量。Thirdly, due to the introduction of human knowledge logic, it not only reduces the possibility of artificial neural networks being affected by noise, but also reduces the number of samples required to train artificial neural networks.

最后，由于规则能够对全局逻辑结构进行约束，人工神经网络只需要聚焦于局部信息结构的统计分析，进而能够降低人工神经网络的复杂度和对计算资源的需求。Finally, because the rules can constrain the global logical structure, the artificial neural network only needs to focus on the statistical analysis of the local information structure, which can reduce the complexity of the artificial neural network and the demand for computing resources.

根据本发明其中一实施例，还提供了一种规则嵌入式人工神经网络的训练方法，应用于上述规则嵌入式人工神经网络系统。图3是根据本发明其中一实施例的规则嵌入式人工神经网络的训练方法的流程图，如图3所示，该方法包括如下步骤：According to an embodiment of the present invention, a training method for a rule embedded artificial neural network is further provided, which is applied to the above rule embedded artificial neural network system. FIG. 3 is a flowchart of a training method of a rule embedded artificial neural network according to an embodiment of the present invention. As shown in FIG. 3, the method includes the following steps:

步骤S30，构建训练数据及对应标签；Step S30, constructing training data and corresponding labels;

步骤S31，构建第一辅助层，其中，第一辅助层的神经元结构与输出层的神经元结构相同，第一辅助层的输入端与第一中间层的神经元输出端相连接；Step S31, constructing a first auxiliary layer, wherein a neuron structure of the first auxiliary layer is the same as a neuron structure of the output layer, and an input end of the first auxiliary layer is connected to a neuron output end of the first intermediate layer;

步骤S32，构建第一优化器，并使用第一优化器迭代并优化由输入层、第一中间层以及第一辅助层构成网络的连接权值，其中，第一优化器的损失函数由给定训练数据时，第一辅助层的神经元输出值与对应标签进行比较获得；Step S32, constructing a first optimizer, and using a first optimizer to iterate and optimize a connection weight of the network formed by the input layer, the first intermediate layer, and the first auxiliary layer, wherein the loss function of the first optimizer is given by When training data, the neuron output value of the first auxiliary layer is obtained by comparing with the corresponding label;

步骤S33，构建第二优化器，其中，第二优化器的损失函数由给定训练数据时，输出层的神经元输出值与对应标签进行比较获得；Step S33, constructing a second optimizer, wherein the loss function of the second optimizer is obtained by comparing the neuron output value of the output layer with the corresponding tag when the training data is given;

步骤S34，固定输入层与第一中间层所构成网络的连接权值，并使用第二优化器迭代优化由输入层、第二中间层、规则化调制层以及输出层构成网络的连接权值。Step S34, the connection weights of the network formed by the input layer and the first intermediate layer are fixed, and the connection weights of the network formed by the input layer, the second intermediate layer, the regularization modulation layer and the output layer are iteratively optimized using the second optimizer.

可选地，在步骤S34中，当存在第三中间层时，固定输入层与第一中间层所构成网络的连接权值，并使用第二优化器迭代优化由输入层、第二中间层、规则化调制层、第三中间层以及输出层构成网络的连接权值。Optionally, in step S34, when there is a third intermediate layer, the connection weight of the network formed by the input layer and the first intermediate layer is fixed, and the input layer and the second intermediate layer are iteratively optimized by using the second optimizer. The regularization modulation layer, the third intermediate layer, and the output layer constitute a connection weight of the network.

下面将结合以下多个优选实施例对上述优选实施过程作进一步地详细说明。The above preferred implementation process will be further described in detail below in conjunction with the following preferred embodiments.

优选实施例一Preferred embodiment 1

该优选实施例提供了一种规则嵌入式人工神经网络及其推断过程。The preferred embodiment provides a rule embedded artificial neural network and its inference process.

在规则嵌入式人工神经网络中，给定输入数据是时间长度为N秒的心电图采样数据，采样率为128Hz，心电图导联数为C。给定人工神经网络的目标是检测输入数据中R波的波峰位置。In the rule-embedded artificial neural network, given input data is ECG sample data with a time length of N seconds, the sampling rate is 128 Hz, and the number of ECG leads is C. The goal of a given artificial neural network is to detect the peak position of the R wave in the input data.

该优选实施例提供的规则嵌入式人工神经网络包括：输入层、中间层A(相当于上述第一中间层)、中间层B(相当于上述第二中间层)、规则化调制层A以及输出层。The ruled embedded artificial neural network provided by the preferred embodiment includes: an input layer, an intermediate layer A (corresponding to the first intermediate layer), an intermediate layer B (corresponding to the second intermediate layer), a regularized modulation layer A, and an output. Floor.

具体地，输入层将输入数据转换为三维空间数据，其数据形状为[1,128*N,C]。中间层A和中间层B均由全卷积网络构成。关于全卷积网络可以参考如下文献：http://www.cnblogs.com/gujianhan/p/6030639.html或Long J,Shelhamer E,Darrell T,et al.Fully convolutional networks for semantic segmentation[J].computer vision and pattern recognition,2015:3431-3440)。图4是根据本发明其中一优选实施例的全卷积网络的结构示意图，如图4所示，在该优选实施例中，全卷积网络的结构包括12个卷积层、6个批量规范化(batch normalization)层、4个最大值池化层、4个反卷积层和4个拼接层，各层之间的顺序如图中箭头方向所示。Specifically, the input layer converts the input data into three-dimensional spatial data whose data shape is [1, 128*N, C]. Both the intermediate layer A and the intermediate layer B are composed of a full convolution network. For the full convolutional network, please refer to the following documents: http://www.cnblogs.com/gujianhan/p/6030639.html or Long J, Shelhamer E, Darrell T, et al. Fully convolutional networks for semantic segmentation [J]. Computer vision and pattern recognition, 2015: 3431-3440). 4 is a schematic structural diagram of a full convolutional network according to a preferred embodiment of the present invention. As shown in FIG. 4, in the preferred embodiment, the structure of the full convolutional network includes 12 convolutional layers and 6 batch normalizations. (batch normalization) layer, 4 maximum pooling layers, 4 deconvolution layers, and 4 stitching layers, the order between the layers is as shown by the direction of the arrow in the figure.

上述卷积层用于捕获数据中的局部模式。上述最大池化层用于总结局部模式并扩大后续神经元的视野。上述批量规范化层用于克服神经网络的梯度消失问题并加快模型训练速度。上述反卷积层用于对神经网络进行上采样，使得输出数据的每个点正好对应到输入数据的每个点。上述拼接层用于将神经网络浅层的高分辨率特征引入到深层，从而提高深层的分辨率。上述池化层的下采样率、反卷积层的上采样率均设置为 2。最后一个卷积层的模板长度为1，其它卷积层的模板长度为3。The convolutional layer described above is used to capture local patterns in the data. The maximum pooling layer described above is used to summarize the local patterns and to expand the field of view of subsequent neurons. The above batch normalization layer is used to overcome the gradient disappearance problem of the neural network and speed up the model training. The deconvolution layer described above is used to upsample the neural network such that each point of the output data corresponds exactly to each point of the input data. The above-mentioned stitching layer is used to introduce high-resolution features of the shallow layer of the neural network into the deep layer, thereby improving the resolution of the deep layer. The down sampling rate of the above pooled layer and the upsampling rate of the deconvolution layer are both set to 2. The template length of the last convolutional layer is 1, and the length of the template of other convolutional layers is 3.

进一步地，在上述结构后增加1个卷积层，其输入包括上述5个卷积层的第一个卷积层的输出和上述5个反卷积层的最后一个反卷积层的输出，如参考文献所述，通过在网络后端(所述最后增加的卷积层)引入前层网络的特征(第一个卷积层的输出)，可以结合信号中的低级特征和高级特征，使得全卷积网络产生准确的输出。Further, after the above structure, a convolution layer is added, and the input includes an output of the first convolution layer of the five convolution layers and an output of the last deconvolution layer of the five deconvolution layers, As described in the references, by introducing features of the front layer network (the output of the first convolutional layer) at the back end of the network (the last added convolutional layer), it is possible to combine low-level features and high-level features in the signal such that A full convolutional network produces accurate output.

另外，各卷积层神经元采用ReLU激活函数。In addition, each convolutional layer neuron uses a ReLU activation function.

中间层A和中间层B的输出数据形状分别为[1,128*N,C1]和[1,128*N,C2]。The output data shapes of the intermediate layer A and the intermediate layer B are [1, 128*N, C1] and [1, 128*N, C2], respectively.

规则化调制层A包括：心电图数据的结构化建模和数据转换。The regularized modulation layer A includes: structured modeling and data conversion of electrocardiogram data.

基于人类对心电图的专业知识，心电图的R波的全局结构化特征规律包括：不应期内(例如：0.3秒)最多只能出现1次R波；心脏产生R波的过程存在一定的节律性，由自主神经和非自主神经共同调节；短时间(例如：60秒)的R波之间的间期的样本标准差通常不会发生剧烈变化(健康状态下波动较小，房颤、早搏等疾病状态下波动较大)。Based on human expertise in electrocardiogram, the global structural characteristics of R wave of ECG include: R wave can only occur at most once during the refractory period (for example: 0.3 seconds); there is a certain rhythm in the process of R wave generated by the heart. , co-regulated by autonomic and non-autonomic nerves; the standard deviation of the interval between R waves in a short period of time (eg, 60 seconds) usually does not change drastically (less fluctuations in healthy state, atrial fibrillation, premature beats, etc.) Fluctuations in the disease state).

结构化建模过程包括：依据上述全局结构化特征规律对中间层A输出数据的每个通道提取极大值集合，其中，极大值满足大于该通道前0.3秒范围内的值且不小于该通道后0.3秒范围内的值；然后针对每个通道，通过迭代的方式获取选定样本点。The structured modeling process includes: extracting a maximum value set for each channel of the intermediate layer A output data according to the global structural feature rule, wherein the maximum value satisfies a value greater than 0.3 seconds before the channel and is not less than the value The value in the range of 0.3 seconds after the channel; then for each channel, the selected sample points are obtained iteratively.

通过迭代的方式获取选定样本点可以分为以下执行步骤：Obtaining selected sample points in an iterative manner can be divided into the following steps:

第一步、对极大值对应的数据幅值进行统计分析，排除其中的异常值，并从极大值集合中删除与异常值对应的元素，排除数量记为e1。排除异常值的方式可以依据3*sigma准则，即，假设数据服从正态分布，计算样本均值和样本标准差，与样本均值距离在3倍样本标准差以外的样本即可认为是异常值。In the first step, statistical analysis is performed on the amplitude of the data corresponding to the maximum value, and the abnormal value is excluded, and the element corresponding to the abnormal value is deleted from the maximum value set, and the excluded quantity is recorded as e1. The way to exclude outliers can be based on the 3*sigma criterion, that is, the data is assumed to follow a normal distribution, and the sample mean and sample standard deviation are calculated. Samples with a sample mean distance other than 3 times the sample standard deviation can be considered as outliers.

第二步、在从极大值集合中删除与异常值对应的元素之后，依次计算极大值集合中剩余的极大值中相邻极大值之间的时间差dT _j，记录三元组<dT _j,TS _j,TE _j>，其中TS _j,TE _j分别表示所述时间差的起点和终点(即上述两个相邻极大值)，j表示各个时间差的编号。 In the second step, after deleting the element corresponding to the outlier from the set of maxima, sequentially calculating the time difference dT _j between adjacent maxima in the maxima of the maxima set, and recording the triad< dT _j , TS _j , TE _j >, where TS _j , TE _j represent the start and end points of the time difference respectively (ie, the two adjacent maxima above), and j represents the number of each time difference.

相邻极大值之间的时间差计算公式为：dT _j＝L _j/r，其中，L _j表示两个极大值之间的数据长度，r表示采样率。对于全卷积网络，其输出数据的采样率通常与原输入信号的采样率一致。 The time difference between adjacent maxima is calculated as: dT _j = L _j /r, where L _j represents the data length between the two maxima and r represents the sampling rate. For a full convolutional network, the sample rate of the output data is usually the same as the sample rate of the original input signal.

第三步、对时间差进行统计分析，排除其中的异常值，排除数量记为e2。In the third step, statistical analysis is performed on the time difference, and the abnormal value is excluded, and the excluded quantity is recorded as e2.

排除异常值的方式可以依据3*sigma准则，即，假设数据服从正态分布，计算样本均值和样本标准差，与样本均值距离在3倍样本标准差以外的样本即可认为是异常值。The way to exclude outliers can be based on the 3*sigma criterion, that is, the data is assumed to follow a normal distribution, and the sample mean and sample standard deviation are calculated. Samples with a sample mean distance other than 3 times the sample standard deviation can be considered as outliers.

第四步、使用剩余的时间差对应的上述三元组中的起点和终点更新极大值集合，若e1>0或者e2>0，转到第一步继续迭代。In the fourth step, the maximum value set is updated using the start point and the end point in the above-mentioned triples corresponding to the remaining time difference. If e1>0 or e2>0, go to the first step to continue the iteration.

第五步、将极大值集合的元素标记为选定样本点。In the fifth step, the elements of the maximum value set are marked as selected sample points.

数据转换过程包括：生成与输入数据相同形状的输出数据，并将选定极大值对应的位置设置为1(表示激活输出)，其它位置设置为0(表示非激活输出)。The data conversion process includes: generating output data of the same shape as the input data, and setting the position corresponding to the selected maximum value to 1 (indicating the activation output), and setting the other position to 0 (indicating the inactive output).

合并中间层B和规则化调制层A的输出数据，作为输入连接到输出层，输出层采用一维卷积操作对输入数据进行处理，产生形状为[1,128*N,2]的数据。然后，通过Reshape重构为[128*N,2]的数据。最后，使用Softmax激活函数产生输出数据，输出数据的最后1列表示对应位置为R波的概率。The output data of the intermediate layer B and the regularization modulation layer A are merged and connected as an input to the output layer, and the output layer processes the input data by a one-dimensional convolution operation to generate data of a shape of [1, 128*N, 2]. Then, the data of [128*N, 2] is reconstructed by Reshape. Finally, the output data is generated using the Softmax activation function, and the last column of the output data indicates the probability that the corresponding position is an R wave.

进一步地，在规则嵌入式人工神经网络的输出层之前还可以增加一个中间层C(相当于上述第三中间层)，以增加神经网络的准确性。中间层C也可采用全卷积网络，网络结构与上述中间层A和中间层B类似；各个中间层的网络深度(层数)、卷积层的模板长度等结构参数均可根据实际应用场景进行调整。Further, an intermediate layer C (corresponding to the third intermediate layer described above) may be added before the output layer of the rule-embedded artificial neural network to increase the accuracy of the neural network. The intermediate layer C can also adopt a full convolutional network, and the network structure is similar to the intermediate layer A and the intermediate layer B; the structural parameters such as the network depth (layer number) and the template length of the convolution layer of each intermediate layer can be based on actual application scenarios. Make adjustments.

出于便捷性考虑，使用全卷积网络而不使用相关技术中提供的推断单样本的人工神经网络。全卷积网络能够将整条数据中的所有样本(心电图的采样点)作为输入，并提供所有样本的推断结果(每个时间点是否属于R波波峰)，由此一方面能够减少计算量，另一方面便于对整条数据进行结构化建模。当然也可以使用只能推断单样本的人工神经网络模型，只不过需要进行多次推断后再进行结构化建模，规则化调制的输出数据也需要使用单个采样点形式的逐个输入后续的神经元。For convenience, a full convolutional network is used instead of the inferred single sample artificial neural network provided in the related art. The full convolutional network can take all the samples in the entire data (sample points of the ECG) as input, and provide the inference results of all the samples (whether each time point belongs to the R wave peak), thereby reducing the amount of calculation on the one hand. On the other hand, it is convenient to carry out structured modeling of the entire data. It is of course also possible to use an artificial neural network model that can only infer a single sample, but only requires multiple inferences before structural modeling. The output data of the regularized modulation also needs to input subsequent neurons one by one in the form of a single sampling point. .

进一步地，上述规则嵌入式人工神经网络的训练过程可以包括以下执行步骤：Further, the training process of the above rule embedded artificial neural network may include the following steps:

第一步、构建数据集。数据集包括：M次测量的心电图数据和对应标签。每次测量的心电图数据包含连续采样的L _i个采样点(i＝1,…,M)。各次测量数据的对应标签为L _i行2列的矩阵。矩阵每行取值分别为：[1,0]和[0,1]，其中，[1,0]表示对应位置不是R波波峰，[0,1]表示对应位置是R波波峰。 The first step is to build a data set. The data set includes: electrocardiogram data and corresponding labels of M measurements. The electrocardiogram data for each measurement contains L _i sample points (i = 1, ..., M) that are continuously sampled. The corresponding label of each measurement data is a matrix of 2 columns of L _i rows. The values of each row of the matrix are: [1,0] and [0,1], where [1,0] indicates that the corresponding position is not the R wave peak, and [0,1] indicates that the corresponding position is the R wave peak.

第二步、将数据集中部分(例如：70％)心电图及其上述对应标签划分为训练数据集，剩余的心电图及其对应标签划分为验证数据集。In the second step, the electrocardiogram (eg, 70%) and the corresponding label in the data set are divided into training data sets, and the remaining electrocardiograms and their corresponding labels are divided into verification data sets.

第三步、构建辅助层，其结构与输出层结构一致，并将中间层A的神经元输出作为输入连接到辅助层。The third step is to construct an auxiliary layer whose structure is consistent with the output layer structure, and the neuron output of the intermediate layer A is connected as an input to the auxiliary layer.

第四步、构建第一优化器，其损失函数由给定训练数据时，辅助层的神经元输出值与对应标签进行比较获得。例如，当输入某次测量的心电图数据(其对应标签为z)时，辅助层的神经元输出向量为x，则损失函数可设计为：x-x*z+log(1+exp(-x))，其中，log()和exp()分别表示取对数(e为底)和取指数(e为底)。The fourth step is to construct a first optimizer whose loss function is obtained by comparing the neuron output value of the auxiliary layer with the corresponding tag when the training data is given. For example, when inputting a measured ECG data (the corresponding label is z), the neuron output vector of the auxiliary layer is x, then the loss function can be designed as: xx*z+log(1+exp(-x)) Where log() and exp() represent the logarithm (e is the base) and the exponent (e is the base), respectively.

第五步、使用第一优化器迭代优化输入层、中间层A以及辅助层所构成网络的连接权值。In the fifth step, the first optimizer is used to iteratively optimize the connection weights of the network formed by the input layer, the intermediate layer A, and the auxiliary layer.

第六步、构建第二优化器，其损失函数由给定训练数据时，输出层的神经元输出值与对应标签进行比较获得。In the sixth step, the second optimizer is constructed, and the loss function is obtained by comparing the neuron output value of the output layer with the corresponding label when the training data is given.

第七步、固定输入层与中间层A所构成网络的连接权值，使用第二优化器迭代优化输入层、中间层B、规则化调制层A、中间层C(若存在)以及输出层所构成网络的连接权值。The seventh step, the connection weight of the network formed by the fixed input layer and the intermediate layer A, uses the second optimizer to iteratively optimize the input layer, the intermediate layer B, the regularized modulation layer A, the intermediate layer C (if present), and the output layer The connection weights that make up the network.

需要说明的是，上述优化器采用基于梯度下降的逐层贪婪训练方法。上述损失函数可以是交叉熵、均方差等距离度量函数。It should be noted that the above optimizer adopts a layer-by-layer greedy training method based on gradient descent. The above loss function may be a cross-entropy, mean square error equal distance metric function.

优选实施例二Preferred embodiment two

该优选实施例与优选实施例一的区别在于，结构化建模的修改如下：The difference between this preferred embodiment and the preferred embodiment is that the structural modeling is modified as follows:

依据上述全局结构化特征规律，对中间层A输出数据的每个通道提取极大值集合，该极大值满足大于该通道前0.3秒范围内的值且不小于该通道后0.3秒范围内的值；然后针对每个通道，通过以下方式获取选定样本点：According to the global characterization feature rule, a maximum value set is extracted for each channel of the output data of the intermediate layer A, and the maximum value satisfies a value within a range of 0.3 seconds before the channel and is not less than 0.3 seconds after the channel Value; then for each channel, the selected sample points are obtained by:

第一步、对极大值对应的数据幅值进行聚类分析，产生3个聚类，以及选取其中元素最多的聚类更新极大值集合。In the first step, the data amplitude corresponding to the maximum value is clustered, three clusters are generated, and the cluster update maximum value set with the most elements is selected.

第二步、以每个极大值位置为起点(记为Ps)，向后搜索极大值作为匹配终点(记为Pe)，计算Ps与Pe之间的时间差(记为dT)，其匹配条件为：dT<1.5秒。In the second step, starting with each maximal position (denoted as Ps), searching for the maximum value as the matching end point (denoted as Pe), and calculating the time difference between Ps and Pe (denoted as dT), which matches The condition is: dT < 1.5 seconds.

第三步、使用所有满足条件的三元组<Ps,Pe,dT>生成匹配集合。In the third step, a matching set is generated using all the triples <Ps, Pe, dT> satisfying the condition.

第四步、使用聚类技术分析匹配集合中的dT，产生3个匹配聚类，选取其中元素最多的匹配聚类更新该匹配集合，以及提取匹配集合内的极大值更新极大值集合。In the fourth step, the clustering technique is used to analyze the dT in the matching set, and three matching clusters are generated, and the matching cluster with the most elements is selected to update the matching set, and the maximum value updating maximum value set in the matching set is extracted.

该优选实施例使用聚类技术替换基于3*sigma准则的异常点排除方法，适用于异常的不是高斯分布从而不适合3*sigma准则的场景。The preferred embodiment replaces the abnormal point exclusion method based on the 3*sigma criterion by using a clustering technique, and is applicable to a scene in which an abnormality is not a Gaussian distribution and thus is not suitable for the 3*sigma criterion.

优选实施例三Preferred embodiment three

该优选实施例与优选实施例一的区别在于，结构化建模过程修改如下：The difference between this preferred embodiment and the preferred embodiment is that the structured modeling process is modified as follows:

依据上述全局结构化特征规律，对中间层A输出数据的每个通道提取极大值集合，该极大值满足大于该通道前0.3秒范围内的值且不小于该通道后0.3秒范围内的值；然后针对每个通道，通过以下方式获取选定样本点及其对应的分类得分：According to the global characterization feature rule, a maximum value set is extracted for each channel of the output data of the intermediate layer A, and the maximum value satisfies a value within a range of 0.3 seconds before the channel and is not less than 0.3 seconds after the channel Value; then for each channel, the selected sample points and their corresponding classification scores are obtained by:

第一步、对极大值对应的数据幅值进行聚类分析，产生3个聚类，选取其中元素最多的聚类，将聚类元素对应的极大值标记为选定样本点。In the first step, the data amplitude corresponding to the maximum value is clustered, and three clusters are generated, and the cluster with the most elements is selected, and the maximum value corresponding to the cluster element is marked as the selected sample point.

第二步、以每个极大值位置为起点(记为Ps)，向后搜索极大值作为匹配终点(记为Pe)，计算Ps与Pe之间的时间差(记为dT)，匹配条件为：dT<1.5秒。In the second step, starting with each maximal position (denoted as Ps), searching for the maximum value as the matching end point (denoted as Pe), calculating the time difference between Ps and Pe (denoted as dT), matching condition Is: dT < 1.5 seconds.

第四步、计算全局特征：匹配集合内dT的样本均值、样本标准差、中值，分别记为dTavg、dTstd、dTmid。The fourth step is to calculate the global feature: the sample mean, sample standard deviation, and median of the dT in the matching set are recorded as dTavg, dTstd, and dTmid, respectively.

第五步、计算每个极大值的局部特征：包含该极大值的三元组个数(记为nt)、包含该极大值的三元组内dT的样本均值和样本标准差(分别记为dTiavg、dTistd)、该极大值与前一极大值之间的时间差(记为dTif)，该极大值与后一极大值之间的时间差(记为dTil)。The fifth step is to calculate the local feature of each maximum value: the number of triplets including the maximum value (denoted as nt), the sample mean value of the dT in the triplet containing the maximum value, and the sample standard deviation ( Respectively denoted as dTiavg, dTistd), the time difference between the maximum value and the previous maximum value (denoted as dTif), the time difference between the maximum value and the latter maximum value (denoted as dTil).

第六步、使用机器学习模型(例如：SVM、神经网络)，输入上述全局特征和局部特征，输出每个选定样本点的分类得分。In the sixth step, using a machine learning model (for example, SVM, neural network), the above global features and local features are input, and the classification score of each selected sample point is output.

机器学习模型是在规则嵌入式人工神经网络训练过程中新增第三优化器，其损失函数由给定训练数据时上述特征值与对应位置标签进行比较获得；然后，通过与机器学习模型相对应的训练方法优化机器学习模型的参数。The machine learning model adds a third optimizer in the process of regular embedded artificial neural network training. The loss function is obtained by comparing the above eigenvalues with corresponding position labels when given training data; and then corresponding to the machine learning model. The training method optimizes the parameters of the machine learning model.

此外，该优选实施例与优选实施例一的区别在于，数据转换过程修改如下：Moreover, the difference between the preferred embodiment and the preferred embodiment 1 is that the data conversion process is modified as follows:

生成与输入数据相同形状的输出数据，并将选定样本点对应的位置设置为对应的分类输出(表示激活输出)，其它位置设置为0(表示非激活输出)。Generates output data of the same shape as the input data, and sets the position corresponding to the selected sample point to the corresponding classification output (indicating the activation output), and the other positions to 0 (indicating the inactive output).

在该优选实施例中，全局结构化特征规律通过特征提取算法来表达，其所提取的特征经过机器学习模型进一步转化为对应位置的激活值。这种方法相比于纯规则算法，无需经验设定边界条件(例如：3*sigma)，不会因边界条件导致不可微分问题，因而人工神经网络模型更稳定，适用范围更广泛。In the preferred embodiment, the global structured feature law is expressed by a feature extraction algorithm, and the extracted features are further transformed into activation values of corresponding locations by the machine learning model. Compared with the purely regular algorithm, this method does not require empirically setting boundary conditions (for example: 3*sigma), and does not cause indivisible problems due to boundary conditions. Therefore, the artificial neural network model is more stable and has a wider range of applications.

优选实施例四Preferred embodiment four

该优选实施例与优选实施例一的区别在于：将中间层A和中间层B进行部分共享。例如：共享前五个卷积层的参数。这种方式能够简化人工神经网络模型的复杂度，并提高人工神经网络模型训练和推断的计算效率。The preferred embodiment differs from the preferred embodiment 1 in that the intermediate layer A and the intermediate layer B are partially shared. For example: share the parameters of the first five convolutional layers. This method can simplify the complexity of the artificial neural network model and improve the computational efficiency of artificial neural network model training and inference.

优选实施例五Preferred embodiment five

该优选实施例与优选实施例一的区别在于：将规则化调制层A、中间层B和中间层C进行二次或多次堆叠。图5是根据本发明其中一优选实施例的二次堆叠的规则嵌入式人工神经网络的结构示意图，如图5所示，将输入层的输出作为输入连接到中间层D，将中间层C的输出作为输入连接到规则化调制层B，将中间层D和规则化调制层B的输出一并作为输入连接到中间层E以及将中间层E的输出作为输入连接到输出层，其中，将规则化调制层A、中间层B以及中间层C设置为第一神经网络组成单元，将规则化调制层B、中间层D以及中间层E设置为第二神经网络组成单元。由此，在上述二次堆叠的规则嵌入式人工神经网络中便存在两个神经网络组成单元(即上述大于一个神经网络组成单元)且第一神经网络组成单元与第二神经网络组成单元首尾相连。The preferred embodiment differs from the preferred embodiment 1 in that the regularized modulation layer A, the intermediate layer B, and the intermediate layer C are stacked two or more times. 5 is a schematic structural diagram of a regularly stacked artificial neural network of a second stack according to a preferred embodiment of the present invention. As shown in FIG. 5, the output of the input layer is connected as an input to the intermediate layer D, and the intermediate layer C is The output is connected as an input to the regularized modulation layer B, the outputs of the intermediate layer D and the regularized modulation layer B are connected as an input to the intermediate layer E and the output of the intermediate layer E is connected as an input to the output layer, wherein the rules are The modulation layer A, the intermediate layer B, and the intermediate layer C are disposed as a first neural network component unit, and the regularization modulation layer B, the intermediate layer D, and the intermediate layer E are set as second neural network constituent units. Therefore, in the above-mentioned secondary stacked regular embedded artificial neural network, there are two neural network constituent units (that is, the above is greater than one neural network constituent unit), and the first neural network constituent unit and the second neural network constituent unit are connected end to end. .

需要说明的是，规则化调制层B与规则化调制层A分别采用不同的结构化建模或者数据转换方式。It should be noted that the regularized modulation layer B and the regularized modulation layer A respectively adopt different structural modeling or data conversion methods.

第一步、构建训练数据及对应标签，其中，训练数据包括M次测量的心电图数据，每次测量的心电图数据包含连续采样的Li个采样点(i＝1,…,M)。各次测量数据的对应标签为Li行2列的矩阵，其中，矩阵每行取值分为[1,0]和[0,1]，其中，[1,0]表示对应位置不是R波波峰，[0,1]表示对应位置是R波波峰。In the first step, the training data and the corresponding label are constructed, wherein the training data includes the electrocardiogram data of the M measurements, and the electrocardiogram data of each measurement includes the Li sampling points (i=1, . . . , M) that are continuously sampled. The corresponding label of each measurement data is a matrix of 2 rows of Li rows, wherein each row of the matrix is divided into [1, 0] and [0, 1], wherein [1, 0] indicates that the corresponding position is not the peak of the R wave. , [0, 1] indicates that the corresponding position is an R wave peak.

第二步、构建辅助层A(相当于上述第一辅助层)，其结构与输出层结构一致，并将中间层A的神经元输出作为输入连接到辅助层A。In the second step, an auxiliary layer A (corresponding to the first auxiliary layer described above) is constructed, the structure of which is identical to that of the output layer, and the neuron output of the intermediate layer A is connected as an input to the auxiliary layer A.

第三步、构建第一优化器，其损失函数由给定训练数据时，辅助层的神经元输出值与对应标签进行比较获得。The third step is to construct a first optimizer whose loss function is obtained by comparing the neuron output value of the auxiliary layer with the corresponding tag when the training data is given.

第四步、迭代优化输入层、中间层A以及辅助层A所构成网络的连接权值。The fourth step is to iteratively optimize the connection weights of the network formed by the input layer, the intermediate layer A, and the auxiliary layer A.

第五步、构建辅助层B(相当于上述第二辅助层)，其结构与输出层结构一致，并将中间层C的神经元输出作为输入连接到辅助层B。In the fifth step, an auxiliary layer B (corresponding to the second auxiliary layer described above) is constructed, the structure of which is identical to that of the output layer, and the neuron output of the intermediate layer C is connected as an input to the auxiliary layer B.

第六步、构建第二优化器，其损失函数由给定训练数据时，辅助层B的神经元输出值与对应标签进行比较获得。In the sixth step, a second optimizer is constructed, and when the loss function is given by the training data, the neuron output value of the auxiliary layer B is compared with the corresponding label.

第七步、固定输入层与中间层A所构成网络的连接权值，迭代优化输入层、中间层B、规则化调制层A、中间层C以及辅助层B所构成网络的连接权值。In the seventh step, the connection weight of the network formed by the fixed input layer and the intermediate layer A is iteratively optimized the connection weights of the network formed by the input layer, the intermediate layer B, the regularized modulation layer A, the intermediate layer C, and the auxiliary layer B.

第八步、构建第三优化器，其损失函数由给定训练数据时，输出层的神经元输出值与对应标签进行比较获得。In the eighth step, the third optimizer is constructed, and the loss function is obtained by comparing the neuron output value of the output layer with the corresponding label when the training data is given.

第九步、固定输入层、中间层A、中间层B、规则化调制层A以及中间层C所构成网络的连接权值。迭代优化输入层、中间层D、规则化调制层B、中间层E以及输出层所构成网络的连接权值。The ninth step, the fixed input layer, the intermediate layer A, the intermediate layer B, the regularized modulation layer A, and the intermediate layer C constitute a connection weight of the network. Iteratively optimizes the connection weights of the network formed by the input layer, the intermediate layer D, the regularized modulation layer B, the intermediate layer E, and the output layer.

在该优选实施例中，规则化调制层堆叠了多次，适用于具备多种不同结构化建模需求的场景。In the preferred embodiment, the regularized modulation layer is stacked multiple times and is suitable for scenarios with a variety of different structured modeling requirements.

优选实施例六Preferred embodiment six

在上述优选实施例一、优选实施例二以及优选实施例三的结构化建模中均包含经验设定的超参数，例如：In the structural modeling of the above preferred embodiment 1, the preferred embodiment 2 and the preferred embodiment 3, the empirically set hyperparameters are included, for example:

在优选实施例一中，异常值排除方法中判为异常的阈值(3倍样本标准差)；In a preferred embodiment 1, the abnormal value exclusion method is determined as an abnormal threshold (3 times sample standard deviation);

在优选实施例二中，聚类参数(输出聚类个数3)；In a preferred embodiment 2, the clustering parameter (the number of output clusters is 3);

在优选实施例三中，机器学习模型的超参数(SVM的C、γ)；In a preferred embodiment 3, the hyperparameters of the machine learning model (C, γ of the SVM);

这些超参数通常需要通过经验来设定，当然还可以在训练过程中通过网格搜索的方式择优设定。These hyperparameters usually need to be set by experience, and of course, they can be optimized by grid search during the training process.

此外，在训练过程完成后，辅助层与输出层的输出结果分别表示：In addition, after the training process is completed, the output results of the auxiliary layer and the output layer respectively represent:

(1)基于“局部信号”的推断；(1) Inference based on "local signals";

(2)基于“全局信号+局部信号”的推断。(2) Inference based on "global signal + local signal".

对比上述两种输出结果的在训练数据集上的差异，可用于分析规则化调制层中具体规则对人工神经网络的补充作用：Comparing the differences in the training data sets of the above two output results can be used to analyze the complementary effects of specific rules in the regularized modulation layer to the artificial neural network:

(1)当特定规则对模型结果不存在有益效果时，删除该规则；(1) When a specific rule does not have a beneficial effect on the model result, the rule is deleted;

(2)当特定规则对模型结果有明显改善时，依据专业知识分析该规则的优势和缺陷，进而启发专家设计新的规则。(2) When the specific rules have obvious improvement on the model results, the advantages and defects of the rules are analyzed according to the professional knowledge, and then the experts are inspired to design new rules.

据此，更新规则调制层的内容，重新训练规则嵌入式人工神经网络的参数。在该优选实施例中，通过择优设定超参数，人工神经网络模型可以变得更加准确。Accordingly, the content of the rule modulation layer is updated, and the parameters of the rule embedded artificial neural network are retrained. In the preferred embodiment, the artificial neural network model can be made more accurate by preferentially setting hyperparameters.

优选实施例七Preferred embodiment seven

该优选实施例与优选实施例一的区别在于：在规则嵌入式人工神经网络的训练过程完成后，从验证数据集方面对比辅助层与输出层的推断结果的准确性。若输出层无法比拟辅助层的性能，则说明规则化调制层过多地抑制了人工神经网络的局部分析功能。The difference between the preferred embodiment and the preferred embodiment 1 is that the accuracy of the inference result of the auxiliary layer and the output layer is compared from the verification data set after the training process of the rule embedded artificial neural network is completed. If the output layer cannot compare the performance of the auxiliary layer, it indicates that the regularized modulation layer excessively suppresses the local analysis function of the artificial neural network.

在这种情况下，需要调整训练方法，迫使优化器更好的优化中间层B的参数，从而提高中间层B在输出层中的作用。训练方法调整如下：In this case, the training method needs to be adjusted to force the optimizer to better optimize the parameters of the intermediate layer B, thereby improving the role of the intermediate layer B in the output layer. The training method is adjusted as follows:

在结构化建模过程中将原有的“将极大值集合的元素标记为选定样本点”调整为将极大值集合的元素及其在临近的(0.1秒范围内)的第二极大值均标记为选定样本点。In the structured modeling process, the original "mark the elements of the maximal set as selected sample points" is adjusted to the elements of the set of maxima and its adjacent (second range of 0.1 second) poles Large values are marked as selected sample points.

上述调整在训练过程中发挥使用，应用于数据推断时应该取消，以避免噪声引起的错误推断。The above adjustments are used during the training process and should be eliminated when applied to data inference to avoid false inferences caused by noise.

根据本发明其中一实施例，还提供了一种存储介质，存储介质包括存储的程序，其中，在程序运行时控制存储介质所在设备执行上述规则嵌入式人工神经网络的训练方法。上述存储介质可以包括但不限于：U盘、只读存储器(ROM)、随机存取存储器(RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。According to an embodiment of the present invention, a storage medium is provided. The storage medium includes a stored program, wherein the device in which the storage medium is located controls the training method of the rule embedded artificial neural network. The above storage medium may include, but is not limited to, a U disk, a read only memory (ROM), a random access memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like, which can store program codes.

根据本发明其中一实施例，还提供了一种处理器，处理器用于运行程序，其中，程序运行时执行上述规则嵌入式人工神经网络的训练方法。上述处理器可以包括但不限于：微处理器(MCU)或可编程逻辑器件(FPGA)等的处理装置。According to an embodiment of the present invention, there is further provided a processor, wherein the processor is configured to execute a program, wherein the program runs the training method of the rule embedded artificial neural network. The above processor may include, but is not limited to, a processing device such as a microprocessor (MCU) or a programmable logic device (FPGA).

上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。The serial numbers of the embodiments of the present invention are merely for the description, and do not represent the advantages and disadvantages of the embodiments.

在本发明的上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present invention, the descriptions of the various embodiments are different, and the parts that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的技术内容，可通过其它的方式实现。其中，以上所描述的装置实施例仅仅是示意性的，例如所述单元的划分，可以为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，单元或模块的间接耦合或通信连接，可以是电性或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed technical contents may be implemented in other manners. The device embodiments described above are only schematic. For example, the division of the unit may be a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above description is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.

Industrial applicability

如上所述，本发明实施例提供的一种规则嵌入式人工神经网络系统及其训练方法具有以下有益效果：结构化建模是对数据全局的建模和理解，克服了前处理方式的技术缺陷。采用两步训练法对神经网络分别进行训练，克服了规则化调制导致的不可微分问题，也克服了后处理纠错方式对规则敏感的问题。由于引入了人类的知识逻辑，因此，不但降低了人工神经网络受噪声影响而犯错的可能性，而且可以减少训练人工神经网络时所需的样本数量。由于规则能够对全局逻辑结构进行约束，人工神经网络只需要聚焦于局部信息结构的统计分析，进而能够降低人工神经网络的复杂度和对计算资源的需求。As described above, a rule embedded artificial neural network system and a training method thereof provided by the embodiments of the present invention have the following beneficial effects: structured modeling is a global modeling and understanding of data, and overcomes technical defects of preprocessing methods. . The two-step training method is used to train the neural network separately, which overcomes the indivisible problem caused by regularized modulation and overcomes the problem that the post-processing error correction method is sensitive to rules. The introduction of human knowledge logic not only reduces the possibility of artificial neural networks being erroneously affected by noise, but also reduces the number of samples required to train artificial neural networks. Since the rules can constrain the global logical structure, the artificial neural network only needs to focus on the statistical analysis of the local information structure, which can reduce the complexity of the artificial neural network and the demand for computing resources.

Claims

A rule embedded artificial neural network system, comprising:

An input layer, a first intermediate layer, a regularization modulation layer, a second intermediate layer, and an output layer, wherein an output end of the input layer is respectively connected to an input end of the first intermediate layer and the second intermediate layer a neuron output end of the first intermediate layer is coupled to an input end of the regularization modulation layer, and an input of the second intermediate layer and the neuron output end of the regularization modulation layer and the output layer The ends are connected.

The system of claim 1 wherein said first intermediate layer and said second intermediate layer comprise one or more layers of neural networks, respectively.

The system of claim 1 further comprising: a third intermediate layer, wherein said second intermediate layer and said neuron output of said regularized modulation layer are merged via said third intermediate layer The input ends of the output layers are connected; or the second intermediate layer is eliminated, and the input layers and the neuron output ends of the regularization modulation layer are merged and then passed through the third intermediate layer and the output layer The inputs are connected.

The system of claim 1 wherein said regularized modulation layer comprises: structured modeling and data transformation, wherein said structured modeling refers to extracting a global structure in the input data based on human expertise logic Characterizing the law of the feature, the data conversion refers to analyzing the sample in the input data by using the global structured feature rule and converting the sample into a neuron output, so that the input data conforms to the global structural characteristic rule The corresponding neurons of the sample produce an activation output and the remaining neurons produce an inactive output.

A method of training a ruled embedded artificial neural network, the method being applied to the rule embedded artificial neural network system according to any one of claims 1 to 4, the method comprising:

Construct training data and corresponding labels;

Constructing a first auxiliary layer, wherein a neuron structure of the first auxiliary layer is the same as a neuron structure of the output layer, and an input end of the first auxiliary layer is connected to a neuron output end of the first intermediate layer;

Constructing a first optimizer and iterating and optimizing a connection weight of the network formed by the input layer, the first intermediate layer and the first auxiliary layer using a first optimizer, wherein the loss function of the first optimizer Obtaining, by the training data, the neuron output value of the first auxiliary layer is compared with a corresponding label;

Constructing a second optimizer, wherein the loss function of the second optimizer is obtained by comparing a neuron output value of the output layer with a corresponding tag when the training data is given;

Securing a connection weight of the input layer and the network formed by the first intermediate layer, and using the second optimizer to iteratively optimize the input layer, the second intermediate layer, the regularization modulation layer, and the output layer The connection weights that make up the network.

The method according to claim 5, wherein when there is a third intermediate layer, a connection weight of the network formed by the input layer and the first intermediate layer is fixed, and iteratively optimized by using the second optimizer The input layer, the second intermediate layer, the regularization modulation layer, the third intermediate layer, and the output layer constitute a connection weight of the network.

The method according to claim 5, wherein when the third intermediate layer is present, the regularization modulation layer, the second intermediate layer, and the third intermediate layer are set as a neural network component unit, and Between the input layer and the output layer and between the first intermediate layer and the output layer, more than one neural network component unit is constructed, wherein the more than one neural network component unit is connected end to end.

The method according to claim 7, wherein when there is more than one neural network constituent unit, the second auxiliary layer is constructed when the parameters of each neural network constituent unit are optimized by training, wherein the The input end of the second auxiliary layer is connected to the output end of the third intermediate layer of the current neural network component unit, and the output shape of the second auxiliary layer is the same as the output layer.

The method according to claim 7, wherein when there is more than one neural network constituent unit, a separate optimizer is separately constructed for each neural network constituent unit, wherein the loss function of each constructed optimizer is The neuron output value of the first auxiliary layer is obtained by comparison with the corresponding tag when the training data is given.

The method according to claim 9, wherein when there is more than one neural network component unit, the plurality of built-in optimizers are used to iteratively optimize the connection rights of each layer of neural networks in each neural network component unit. value.

The method of claim 10, wherein said plurality of separate optimizers are constructed using a gradient-based layer-by-layer greedy optimization algorithm; each constructed optimizer's loss function is to characterize the difference between the output value and the target value a degree of derivable function, wherein the derivable function comprises: a cross entropy distance metric function, a mean squared distance metric function.