CN112819139A

CN112819139A - Optimal conversion method from artificial neural network to impulse neural network

Info

Publication number: CN112819139A
Application number: CN202110111807.2A
Authority: CN
Inventors: 顾实; 邓师旷
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2021-05-18

Abstract

The invention discloses an optimal conversion method from an artificial neural network to an impulse neural network. The method includes setting a source artificial neural network ann that meets requirements, setting an upper limit for an activation function; training the source artificial neural network ann, and Record the maximum value of the output of each layer of the neural network; build a spiking neural network snn with the same network structure as the source artificial neural network ann but remove the activation function of each layer, and convert the network weight and bias of each layer in the source artificial neural network ann The bias is copied to each layer of the spiking neural network snn, and the recorded maximum output value a _l of each layer of the source artificial neural network ann is used as the threshold value of the corresponding layer of the spiking neural network snn

Set a desired simulation time T, according to the formula

The bias bias of each layer in the spiking neural network snn increases the offset

to minimize the conversion error. Through the above scheme, the present invention achieves the theoretical minimum value of conversion error, and has high practical value and popularization value.

Description

An optimal conversion method from artificial neural network to spiking neural network

技术领域technical field

本发明属于神经网络技术领域，具体地讲，是涉及一种从人工神经网络到脉冲神经网络的最优转换方法。The invention belongs to the technical field of neural networks, and specifically relates to an optimal conversion method from an artificial neural network to an impulse neural network.

背景技术Background technique

脉冲神经网络(Spike Neuronal Networks)被称为新一代的神经网络模型，由模拟生物神经元的脉冲神经单元组成，善于处理离散的脉冲信号。脉冲神经元只会在激活时发出一个脉冲，脉冲大小恒定，信息包含在脉冲的发放时间与频率里。独特的脉冲信号传递机制以及事件驱动的特性，使得脉冲神经网络在相关硬件上的推理运行速度极快并且能量消耗远低于传统的人工神经网络1。但是由于脉冲序列的离散性，脉冲神经网络无法直接使用反向传播来进行推理训练，所以如何训练一个复杂高效的脉冲神经网络是本领域的一个难点。Spike Neuronal Networks is known as a new generation of neural network models, which are composed of spiking neural units that simulate biological neurons and are good at processing discrete pulse signals. A spiking neuron only emits a single pulse when activated, the pulse size is constant, and the information is contained in the timing and frequency of the pulse. The unique impulse signal transmission mechanism and event-driven characteristics make the inference running speed of the impulse neural network on the relevant hardware extremely fast and the energy consumption is much lower than that of the traditional artificial neural network1. However, due to the discrete nature of the spiking sequence, the spiking neural network cannot directly use backpropagation for inference training, so how to train a complex and efficient spiking neural network is a difficulty in the field.

脉冲神经网络的训练方法主要有三个方向，分别是生物启发式方法2,3，代替梯度4,5,6(Surrogate Gradient)和从人工神经网络转换7,8(ANN-to-SNN)。他们适合的网络深度以及所需仿真时间长度有所不同：代替梯度和生物启发式方法适合较浅较简单的脉冲神经网络，而网络转换可以适用于更复杂的脉冲神经网络；生物启发式和网络转换方法需要较长的仿真时间长度(通常大于1000)，而代替梯度仅仅需要小于100的仿真长度就能完成复杂任务。而网络转换方法在各种任务上的性能表现是三者中最优秀的，它往往能取得非常接近人工神经网络的结果，但是由于需要的仿真时间较长，所以植入硬件后的运行速度和能量消耗高于代替梯度的方法。最近阈值平衡是最常见的网络转换方法，它首先构建一个与脉冲神经网络结构相同的传统神经网络并且激活函数使用ReLU函数，在训练集上会首先训练构建的源神经网络，并且会记录网络每一层激活函数输出的最大激活值。完成训练后，直接把源神经网络的权重复制给脉冲神经网络，最后将记录的最大激活值设置为脉冲神经网络的对应层膜电位阈值，这样就完成了一个性能可靠的脉冲神经网络。源网络层之间传递的信息在脉冲神经网络中被离散化，而脉冲频率约等于此信息值。There are three main directions of training methods for spiking neural networks, namely biological heuristics 2, 3, instead of gradients 4, 5, 6 (Surrogate Gradient) and conversion from artificial neural networks 7, 8 (ANN-to-SNN). The depth of the network they are suitable for and the length of simulation time required vary: instead of gradients and biological heuristics for shallower and simpler spiking neural networks, network transformations can be used for more complex spiking neural networks; biological heuristics and networks The transformation method requires a long simulation time length (usually greater than 1000), while the replacement gradient requires only a simulation length of less than 100 to complete complex tasks. The performance of the network conversion method on various tasks is the best among the three. It can often achieve results that are very close to the artificial neural network. However, due to the long simulation time required, the running speed and The energy consumption is higher than that of the alternative gradient method. The nearest threshold balance is the most common network conversion method. It first builds a traditional neural network with the same structure as the spiking neural network and the activation function uses the ReLU function. On the training set, the constructed source neural network is first trained, and the network will be recorded every time. The maximum activation value output by the activation function of a layer. After the training is completed, the weights of the source neural network are directly copied to the spiking neural network, and finally the recorded maximum activation value is set as the corresponding layer membrane potential threshold of the spiking neural network, thus completing a reliable spiking neural network. The information passed between the layers of the source network is discretized in the spiking neural network, and the spiking frequency is approximately equal to the value of this information.

虽然网络转换方法在各种复杂任务上的网络表现最好，但是距离人工神经网络仍然有一定差距，并且较长的仿真时间削弱了脉冲神经网络的节能性特点。目前有一系列的方法来改进网络转换的缺点，如：使用99.9％最大值进行阈值平衡7、使用两层之间的脉冲频率调节阈值8、使用软重置机制8(脉冲后膜电位减去阈值)代替重置机制(脉冲后膜电位归零)、混合训练方法9(先使用网络转换再使用代替梯度训练)等等。但是他们只是致力于提高脉冲频率以及减小层间留存膜电位，缺乏解释转换误差降低的理论依据，并且目前已知的工作在复杂数据集如ImageNet上的仿真结果仍然不令人满意(仿真时间或者任务表现)，另外频率调节8和混合训练9在训练或者转换步骤上增加了额外的不可忽略开销。因此如何解决现有技术中存在的问题是本领域技术人员亟需解决的问题。Although the network conversion method has the best network performance on various complex tasks, it still has a certain gap from artificial neural networks, and the long simulation time weakens the energy-saving characteristics of spiking neural networks. There are currently a range of approaches to ameliorate the shortcomings of network switching, such as: threshold balancing using 99.9% of the maximum value,7 using the pulse frequency between the two layers to adjust the threshold8, using a soft reset mechanism8 (post-pulse membrane potential minus threshold ) instead of reset mechanisms (membrane potential zeroing after a pulse), hybrid training methods9 (using network transformations followed by alternative gradient training), etc. However, they only focus on increasing the pulse frequency and reducing the residual membrane potential between layers, lacking a theoretical basis to explain the reduction of the conversion error, and the currently known simulation results on complex datasets such as ImageNet are still unsatisfactory (simulation time or task performance), additionally frequency tuning 8 and hybrid training 9 add additional non-negligible overhead on the training or transformation steps. Therefore, how to solve the problems existing in the prior art is an urgent problem to be solved by those skilled in the art.

背景文件中提到的参考文献具体如下：The references mentioned in the background paper are as follows:

1.Kim S,Park S,Na B等.Spiking-YOLO:尖峰神经网络用于节能目标检测[C]//AAAI人工智能会议论文集.2020,34(07):11270-11277.1. Kim S, Park S, Na B, etc. Spiking-YOLO: Spike Neural Networks for Energy-Efficient Object Detection [C]//AAAI Artificial Intelligence Conference Proceedings. 2020, 34(07): 11270-11277.

2.Caporale N,Dan Y.尖峰时序依赖的可塑性：Hebbian学习规则[J].Annu.Rev.Neurosci.,2008,31:25-46.2. Caporale N, Dan Y. Spike timing-dependent plasticity: Hebbian learning rules [J]. Annu. Rev. Neurosci., 2008, 31: 25-46.

3.Kheradpisheh S R,Ganjtabesh M,Thorpe S J等.基于STDP的尖峰深度卷积神经网络用于目标识别[J].神经网络,2018,99:56-67.3. Kheradpisheh S R, Ganjtabesh M, Thorpe S J et al. Spiking Deep Convolutional Neural Networks Based on STDP for Object Recognition [J]. Neural Networks, 2018, 99: 56-67.

4.Shrestha S B,Orchard G.Slayer:时序脉冲误差重分配[J].神经信息处理系统进展会议,2018,31:1412-1421.4. Shrestha S B, Orchard G. Slayer: Timing Pulse Error Redistribution [J]. Conference on Advances in Neural Information Processing Systems, 2018, 31: 1412-1421.

5.Wu Y,Deng L,Li G等.直接训练尖峰神经网络：更快，更大，更好[C]//AAAI人工智能会议论文集.2019,33:1311-1318.5. Wu Y, Deng L, Li G, etc. Directly training spiking neural networks: faster, bigger, better [C] // AAAI Artificial Intelligence Conference Proceedings. 2019, 33: 1311-1318.

6.Neftci E O,Mostafa H,Zenke F.尖峰神经网络中的替代梯度学习[J].IEEE信号处理杂志,2019,36:61-63.6. Neftci E O, Mostafa H, Zenke F. Alternative gradient learning in spiking neural networks [J]. IEEE Journal of Signal Processing, 2019, 36: 61-63.

7.Rueckauer B,Lungu I A,Hu Y等.将连续值深层网络转换为有效的事件驱动网络以进行图像分类[J].神经科学前沿,2017,11:682.7. Rueckauer B, Lungu I A, Hu Y, et al. Converting continuous-valued deep networks into efficient event-driven networks for image classification [J]. Frontiers in Neuroscience, 2017, 11:682.

8.Han B,Srinivasan G,Roy K.RMP-SNN:残留膜电位神经元，可实现更深的高精度和低延迟的尖峰神经网络[C]//IEEE/CVF会议上的计算机视觉和模式识别会议论文集.2020:13558-13567.8. Han B, Srinivasan G, Roy K. RMP-SNN: Residual Membrane Potential Neurons, Enabling Deeper Spiking Neural Networks with High Accuracy and Low Latency [C] // Computer Vision and Pattern Recognition at IEEE/CVF Conference Conference Proceedings. 2020:13558-13567.

9.Rathi N,Srinivasan G,Panda P等.通过混合转换和峰值定时相关的反向传播来启用深度峰值神经网络[C]//国际学习代表大会.2019.9. Rathi N, Srinivasan G, Panda P, et al. Enabling Deep Spike Neural Networks by Hybrid Transformation and Spike Timing-Dependent Backpropagation [C] // International Congress of Learning. 2019.

10.Sengupta A,Ye Y,Wang R等.深入了解尖峰神经网络：Vgg和残差架构[J].神经科学前沿,2019,13:95.10. Sengupta A, Ye Y, Wang R, etc. In-depth understanding of spiking neural networks: Vgg and residual architecture [J]. Frontiers in Neuroscience, 2019, 13:95.

发明内容SUMMARY OF THE INVENTION

为了克服现有技术中的上述不足，本发明提供一种从人工神经网络到脉冲神经网络的最优转换方法，使得目标脉冲神经网络的任务表现接近源人工神经网络而仅仅需要与代替梯度方法接近的仿真时间。In order to overcome the above deficiencies in the prior art, the present invention provides an optimal conversion method from artificial neural network to spiking neural network, so that the task performance of the target spiking neural network is close to that of the source artificial neural network and only needs to be close to that of the alternative gradient method simulation time.

为了实现上述目的，本发明采用的技术方案如下：In order to achieve the above object, the technical scheme adopted in the present invention is as follows:

一种从人工神经网络到脉冲神经网络的最优转换方法，包括如下步骤：An optimal conversion method from artificial neural network to spiking neural network, comprising the following steps:

(S1)设置一个满足要求的源人工神经网络ann，并对激活函数设定一个上限；(S1) Set a source artificial neural network ann that meets the requirements, and set an upper limit for the activation function;

(S2)训练源人工神经网络ann，并且记录神经网络每一层输出的最大值；(S2) training the source artificial neural network ann, and recording the maximum value of the output of each layer of the neural network;

(S3)构建一个与源人工神经网络ann每一层网络模型都相同的脉冲神经网络snn，且去掉每一层的激活函数；(S3) constructing a spiking neural network snn which is the same as the network model of each layer of the source artificial neural network ann, and removes the activation function of each layer;

(S4)将源人工神经网络ann中每层的网络权重weight和偏差bias复制到脉冲神经网络snn的每层中，并且将记录的源人工神经网络ann每层最大输出值a_l作为脉冲神经网络snn对应层的阈值

(S4) Copy the network weight weight and bias bias of each layer in the source artificial neural network ann to each layer of the spiking neural network snn, and use the recorded maximum output value a _l of each layer of the source artificial neural network ann as the spiking neural network The threshold of the corresponding layer of snn

(S5)设置一个期望仿真时间T，按照公式

脉冲神经网络snn中的每一层的偏差bias增加偏移量

使得转换误差最小化。(S5) Set an expected simulation time T, according to the formula

to minimize the conversion error.

具体地，所述步骤(S1)中的要求包括三点，第一是激活函数具有去掉负值部分的作用；第二是源人工神经网络ann需要使用平均池化层；第三是源神经网络ann上不能使用批归一化层。Specifically, the requirements in the step (S1) include three points, the first is that the activation function has the function of removing the negative value part; the second is that the source artificial neural network ann needs to use an average pooling layer; the third is that the source neural network Batch normalization layers cannot be used on ann.

与现有技术相比，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

(1)本发明的目标脉冲神经网络snn与源人工神经网络ann在执行相同任务下几乎不存在性能差距，甚至使用阈值操作能够提升部分不使用批归一化的人工神经网络的性能。由于本发明从理论上优化了转换误差，所以与现有技术相比，能够仅仅使用大约1/10的仿真时间就能达到相同效果，对于在硬件上实现性能高能耗低的脉冲神经网络snn有极大的帮助。此外，与现有技术相比，本发明从理论上推导了转换误差，且能在测试集未知情况下达到转换误差的理论最小值。(1) There is almost no performance gap between the target spiking neural network snn of the present invention and the source artificial neural network ann when performing the same task, and even the use of threshold operations can improve the performance of some artificial neural networks that do not use batch normalization. Since the present invention optimizes the conversion error theoretically, compared with the prior art, the same effect can be achieved by only using about 1/10 of the simulation time. Great help. In addition, compared with the prior art, the present invention derives the conversion error theoretically, and can reach the theoretical minimum value of the conversion error when the test set is unknown.

附图说明Description of drawings

图1为本发明提供的对激活函数进行阈值和位移操作的示意图。FIG. 1 is a schematic diagram of performing threshold and displacement operations on an activation function provided by the present invention.

图2为本发明通过阈值平衡将ANN神经元转换为SNN神经元的示意图。FIG. 2 is a schematic diagram of converting ANN neurons into SNN neurons through threshold balance in the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明作进一步说明，本发明的实施方式包括但不限于下列实施例。The present invention will be further described below with reference to the accompanying drawings and examples. The embodiments of the present invention include but are not limited to the following examples.

实施例Example

转换脉冲神经网络所需的人工神经网络在结构上有三点要求，第一是激活函数具有去掉负值部分的作用，如图1所示，ReLU的函数在坐标轴第二象限都是直接置为0，而第一象限的图像和脉冲神经网络的脉冲频率函数图像相似，有一定的转换基础；第二是源人工神经网络需要使用平均池化层而不是更加常见的最大池化层，原因是转换为脉冲神经网络后，由于脉冲的大小恒定，最大池化层对脉冲没有池化能力，比如在最大池化层一个核内的两种输入：The artificial neural network required to convert the spiking neural network has three structural requirements. The first is that the activation function has the function of removing the negative part. As shown in Figure 1, the function of ReLU in the second quadrant of the coordinate axis is directly set as 0, and the image in the first quadrant is similar to the image of the pulse frequency function of the spiking neural network, and has a certain conversion basis; the second is that the source artificial neural network needs to use an average pooling layer instead of the more common max pooling layer, the reason is After converting to a spiking neural network, since the size of the pulse is constant, the max pooling layer has no pooling ability to the pulse, such as two inputs in one kernel of the max pooling layer:

其输出都为1，没有辨别性。第三是源神经网络上不能使用批归一化层，因为批归一化会导致激活分布的方差增加，而使得在仿真前期更容易产生噪声脉冲。Its output is all 1, which is not discriminative. The third is that batch normalization layers cannot be used on the source neural network, because batch normalization will increase the variance of the activation distribution, making it easier to generate noise spikes early in the simulation.

本发明一种从人工神经网络到脉冲神经网络的最优转换方法包括训练步骤和转换步骤。An optimal conversion method from artificial neural network to spiking neural network in the present invention includes a training step and a conversion step.

训练步骤：Training steps:

(S1)设置一个满足上面三点要求的源人工神经网络ann，并且使用类ReLU函数的激活函数h，激活函数h即网络每一层接收到上一层的输入后输出为x_l＝h(W_l·x_l-1)，h是一种非线性变换的函数，保留预激活值大于0的部分。在转换方法中，激活函数h被设置一个上限，比如使用MobileNet中的ReLU6，即设置ReLU的上限为6，即x＝clip(x,0,6)。由于脉冲神经网络snn与源人工神经网络ann的输入都为同样的图像数据，假设源人工神经网络ann为只有一层的网络，即x_l-1为图像像素值，因此希望脉冲神经网络snn最后能够得到与源人工神经网络ann相同的输出，这样snn和ann在此任务下的表现是一样的。于是使用h′来表示脉冲神经网络的激活函数，即x_l′＝h′(W_l·x_l-1)，在输入和权重都相同的情况下，要是的输出相同，则需要在不对脉冲神经网络snn原则进行更改的情况下调整激活函数h′。(S1) Set up a source artificial neural network ann that meets the requirements of the above three points, and use the activation function h of the ReLU-like function, that is, the output of each layer of the network after receiving the input of the previous layer is x _l =h ( W _l ·x _l-1 ), h is a function of nonlinear transformation, retaining the part of the pre-activation value greater than 0. In the conversion method, the activation function h is set with an upper limit, such as using ReLU6 in MobileNet, that is, setting the upper limit of ReLU to 6, that is, x=clip(x, 0, 6). Since the input of the spiking neural network snn and the source artificial neural network ann are the same image data, it is assumed that the source artificial neural network ann is a network with only one layer, that is, x _l-1 is the image pixel value, so it is hoped that the spiking neural network snn will finally can get the same output as the source artificial neural network ann, so that snn and ann perform the same on this task. Therefore, h' is used to represent the activation function of the spiking neural network, that is, x _l '=h'(W _l ·x _l-1 ). In the case of the same input and weight, if the output is the same, it is necessary to adjust the pulse The activation function h′ is adjusted when the neural network snn principle is changed.

(S2)训练源人工神经网络ann，并且记录神经网络每一层输出的最大值，比如第l层的最大输出为a_l，那么实际上的图1中V_th＝a_l。(S2) Train the source artificial neural network ann, and record the maximum output value of each layer of the neural network. For example, the maximum output of the lth layer is a _l , then V _th = a _l in actual FIG. 1 .

转换步骤：Conversion steps:

然后设置与输出形状相同的膜电位矩阵M_l来存储神经元的膜电位v^l(t)。脉冲神经网络(snn)的神经元的输入是一系列的脉冲序列，在本发明中第一层的输入为图片的像素值，其他层都是脉冲序列。比如设置当前层的脉冲大小为1后，在每一时刻，它的输出只能是1或者0，例如：仿真时间T为10，某一神经元对应的ann神经元的输出为0.3，那么它会在T内脉冲3次，这样它的总脉冲期望为3/10＝0.3，与ann的输出相同。脉冲神经元会在内部保存一个膜电位，它会收集膜电位的增量值(W_l·x′_l)，当它超过阈值电位V_th后就会释放脉冲。A membrane potential matrix M _l of the same shape as the output is then set to store the neuron's membrane potential v ^l (t). The input of the neuron of the spiking neural network (snn) is a series of pulse sequences. In the present invention, the input of the first layer is the pixel value of the picture, and the other layers are all pulse sequences. For example, after setting the pulse size of the current layer to 1, at each moment, its output can only be 1 or 0. For example, if the simulation time T is 10, and the output of the ann neuron corresponding to a neuron is 0.3, then it will pulse 3 times within T so that its total pulse is expected to be 3/10 = 0.3, the same as the output of ann. The spiking neuron stores a membrane potential internally, it collects the incremental value of the membrane potential (W _l ·x′ _l ), and when it exceeds the threshold potential V _th , it releases a pulse.

(S4)如图2，将ann中每层的网络权重weight和偏差bias复制到脉冲神经网络snn的每层中，并且将记录的源人工神经网络ann每层最大输出值a_l作为snn对应层的阈值

这样脉冲神经网络snn就能运行公式(1)，(S4) As shown in Figure 2, copy the network weight weight and bias bias of each layer in ann to each layer of the spiking neural network snn, and use the recorded maximum output value a _l of each layer of the source artificial neural network ann as the corresponding layer of snn the threshold

In this way, the spiking neural network snn can run formula (1),

每一时刻，输入为图片的像素值，输出脉冲，输出脉冲频率最大的位置就是图片的类别。At each moment, the input is the pixel value of the picture, the output pulse, and the position where the output pulse frequency is the largest is the category of the picture.

本发明使用脉冲神经网络中的IF模型¹以及软重置机制²，令网络l-层神经元在t时刻的膜电位为v^l(t)，接受的脉冲序列在t+1时刻的值为x_l′(t+1)，则神经元在接收到脉冲信号后，其膜电位的变化为公式(1)，其中，W_l是l层神经元与l-1层神经元的连接权重，当

超过阈值电位V_th后，神经元会释放大小等于阈值的脉冲信号，使用θ^l(t+1)来表示t+1时刻的脉冲值，它只能等于0或者V_th。The present invention uses the IF model ¹ and the soft reset mechanism ² in the spiking neural network, so that the membrane potential of the 1-layer neuron in the network at time t is v ^l (t), and the received pulse sequence at time t+1 is the value of x _l ′(t+1), then after the neuron receives the impulse signal, the change of its membrane potential is formula (1), where W _l is the connection weight between the neurons in the l layer and the neurons in the l-1 layer, when

After the threshold potential V _th is exceeded, the neuron will release a pulse signal with a magnitude equal to the threshold value, and θ ^l (t+1) is used to represent the pulse value at the time of t+1, which can only be equal to 0 or V _th .

结合软重置机制(脉冲后膜电位减去阈值V_th)，可以得到膜电位的更新公式：Combined with the soft reset mechanism (post-pulse membrane potential minus threshold V _th ), an updated formula for membrane potential can be obtained:

v^l(t+1)＝v^l(t)+W_l·x_l′(t+1)-θ^l(t+1) (2)v ^l (t+1)=v ^l (t)+W _l ·x _l ′(t+1)-θ ^l (t+1) (2)

然后将0时刻到T时刻的以上公式叠加，并且在方程左右两边同时除以T，得到公式(3)：Then superimpose the above formulas from time 0 to time T, and divide the left and right sides of the equation by T at the same time to obtain formula (3):

最后使用

来代表网络l层从0到T时刻的平均输入，那么l层的期望输出

同时令网络的初始膜电位v^l(0)为0，可以得到脉冲神经网络的期望输出与平均输入的关系：last use

to represent the average input of network l layer from 0 to T time, then the expected output of l layer

At the same time, let the initial membrane potential v ^l (0) of the network be 0, the relationship between the expected output of the spiking neural network and the average input can be obtained:

其中，公式最后一部分v^l(T)/T是在仿真结束时仍然残留在膜电位里的信息，前半部分类似于人工神经网络的前向过程，于是可以将-v^l(T)/T看作使用了一种特殊的激活函数，使用以下激活函数h′(·)可以替换以上方程：Among them, the last part of the formula v ^l (T)/T is the information that still remains in the membrane potential at the end of the simulation, and the first half is similar to the forward process of the artificial neural network, so you can see -v ^l (T)/T as A special activation function is used, and the above equation can be replaced by the following activation function h'( ):

其中，

表示向下取整，而clip是截断函数，表示截断小于0和大于T的部分。in,

Indicates rounding down, and clip is a truncation function, which means truncation of parts less than 0 and greater than T.

相对应的，使用h(·)来表示人工神经网络的激活函数，得到公式(6)：Correspondingly, using h( ) to represent the activation function of the artificial neural network, formula (6) is obtained:

a_l+1＝h(W_l·a_l) (6)。a _l+1 =h(W _l ·a _l ) (6).

在每一层中，我们的期望是两个网络上对应的输出a′_l+1与a_l+1更加的接近，但是他们是有两个方面的差异导致了他们始终是有一定距离的，首先是两个激活函数的差距，如图1，在预激活值大于0的部分，h的变化曲线是一条直线，而h′的变化曲线是阶梯形的曲线，他们始终是有差距的。其次当前层ann的输入和snn的输入并不相同，除去第一层都是图像像素输入外，每一层都会存在激活函数导致的误差，随着误差的积累，最后输入的差距也会慢慢变大。In each layer, our expectation is that the corresponding outputs a' _l+1 on the two networks are closer to a _l+1 , but they have two differences that cause them to always have a certain distance, The first is the difference between the two activation functions. As shown in Figure 1, in the part where the pre-activation value is greater than 0, the change curve of h is a straight line, while the change curve of h' is a stepped curve, and they are always different. Secondly, the input of the current layer ann is not the same as the input of snn. Except that the first layer is input of image pixels, each layer will have errors caused by the activation function. With the accumulation of errors, the gap between the final input will also gradually get bigger.

损失函数是网络训练使用的重要函数，它一般是均方误差或者softmax函数，例如均方误差为(a_L-y)²，其中前者是网络的最后一层的输出，后者是我们希望得到的值，比如在手写数字识别上，输入图片为3，y的值为[0,0,1,0,0,0,0,0,0]，假设本发明的输出a_L为[0,0.1,0.9,0,0,0.1,0,0.2,0.2]，网络判断出第三个位置的神经元激活频率最大，识别出此图片写的数字为3。本发明认为转换的脉冲神经网络与源人工神经网络之间的性能差距是由于转换误差导致的，转换误差可以认为是网络的损失函数

之间的差值：The loss function is an important function used in network training. It is generally the mean square error or softmax function. For example, the mean square error is (a _L -y) ² , where the former is the output of the last layer of the network, and the latter is what we hope to get For example, in handwritten digit recognition, the input picture is 3, and the value of y is [0,0,1,0,0,0,0,0,0], assuming that the output a _L of the present invention is [0, 0.1, 0.9, 0, 0, 0.1, 0, 0.2, 0.2], the network determines that the neuron in the third position has the highest activation frequency, and recognizes that the number written in this picture is 3. The present invention considers that the performance gap between the converted spiking neural network and the source artificial neural network is caused by the conversion error, which can be considered as the loss function of the network

Difference between:

对于脉冲神经网络的第l层，可以将它的激活函数近似为人工神经网络的激活函数h加上一个误差项的形式：a′_l＝h′_l(W_l·a′_l-1)＝h_l(W_l·a′_l-1)+Δa′_l，其中，Δa′_l是在相同的输入情况下，由于两种网络激活函数不同产生的输出误差。但是由于误差的积累，导致除了输入层，每层的输入都有差异，本发明可以将每层的输出误差分解为由于激活函数导致的部分Δa′_l和由于输入不同而导致的部分Δa_l-1，：For the lth layer of the spiking neural network, its activation function can be approximated by the activation function h of the artificial neural network plus an error term: a' _l =h' _l (W _l ·a' _l-1 )= h _l (W _l ·a′ _l-1 )+Δa′ _l , where Δa′ _l is the output error due to the different activation functions of the two networks under the same input condition. However, due to the accumulation of errors, the input of each layer is different except the input layer. The present invention can decompose the output error of each layer into the part Δa′ _l caused by the activation function and the part Δa _{l- 1} ,:

Δa_l：＝a′_l-a_l＝Δa′_l+[h_l(W_l·a′_l-1)-h_l(W_l·a_l-1)]≈Δa′_l+B_l·W_l·Δa_l-1 Δa _l :=a′ _l −a _l =Δa′ _l +[h _l (W _l ·a′ _l-1 )-h _l (W _l ·a _l-1 )]≈Δa′ _l +B _l ·W _l Δa _l-1

其中，最后的约等于代表一阶泰勒展开，B_l是激活函数h_l的一阶导数在对角线上的矩阵。同时本发明使用二阶泰勒公式展开转换误差的表达式：Among them, the last approximation represents the first-order Taylor expansion, and B _l is the matrix of the first-order derivative of the activation function h _l on the diagonal. At the same time, the present invention uses the second-order Taylor formula to expand the expression of the conversion error:

其中，H_aL是黑塞矩阵，由于

是在源神经网络ann的损失函数，在训练过程中被最小化了，所以第一项可以认为非常接近0而被忽略。所以转换误差主要存在于第二项中，将输出误差Δa_l的表达式(7)代入(8)的第二项中得到：where H _aL is the Hessian matrix, since

is the loss function of the source neural network ann, which is minimized during training, so the first term can be considered very close to 0 and ignored. So the conversion error mainly exists in the second term, and the expression (7) of the output error _Δal is substituted into the second term of (8) to get:

其中，中间相互作用项可以被解耦假设所忽略，并且从以前的工作3可以得到

并替代上式(9)最后一项得：where the intermediate interaction term can be ignored by the decoupling assumption, and from previous work3 it can be obtained

And replace the last term of the above formula (9) to get:

最后逐步向前层递进展开，复杂的网络转换误差就被分解为了不同层E[Δa′_l ^TH_alΔa′_l]的加和，公式(10)之间两两独立，只需要对每一个E[Δa′_l ^TH_alΔa′_l]都优化到最小，那么就能最小化总体的网络转换误差。Finally, the forward layer is gradually progressed, and the complex network conversion error is decomposed into the sum of different layers E[Δa′ _l ^T H _al Δa′ _l ]. The equations (10) are independent of each other. A E[Δa′ _l ^T H _al Δa′ _l ] is optimized to the minimum, then the overall network conversion error can be minimized.

图1中，阶梯型的分段虚线是脉冲神经网络的期望激活函数h′，而直线的是源神经网络采用的ReLU激活函数h，两者的上升趋势有一定的相似性，但是由于h′最大不能超过V_th，所以当预激活值远大于V_th时，误差会非常大，所以本发明首先考虑阈值操作，即给ReLU函数设置一个上界y_th，当激活值过大时，输出恒定为y_th，并且导数梯度为0，这样就能限制当激活值过大时的转换误差。其次是位移操作，本发明考虑可以将h′或者h平移一定的距离δ，即在网络中对偏移项b进行增加或减少。考虑公式(10)中H_al为常数，那么需要最小化Δa′_l ^TΔa′_l]，即相同输入下，激活函数所导致的误差。本发明考虑用一个位移变换δ达到目的，在l层中，只需要最小化以下公式：In Figure 1, the stepped segmented dotted line is the expected activation function h' of the spiking neural network, and the straight line is the ReLU activation function h used by the source neural network. The upward trend of the two has a certain similarity, but due to h' The maximum value cannot exceed V _th , so when the pre-activation value is much larger than V _th , the error will be very large, so the present invention first considers the threshold operation, that is, an upper bound y _th is set for the ReLU function, and when the activation value is too large, the output is constant is y _th , and the derivative gradient is 0, which limits the translation error when the activation value is too large. The second is the displacement operation. The present invention considers that h' or h can be shifted by a certain distance δ, that is, the offset term b is increased or decreased in the network. Considering that H _al is a constant in formula (10), it is necessary to minimize Δa′ _l ^T Δa′ _l ], that is, the error caused by the activation function under the same input. The present invention considers using a displacement transformation δ to achieve the purpose, in layer l, only the following formula needs to be minimized:

E_z[h′_l(z-δ)-h_l(z)]² (11)E _z [h′ _l (z-δ)-h _l (z)] ² (11)

其中，预激活值z＝W·a′_l-1在图1的一个周期[(t-1)V_th/T,tV_th/T](t＝1,2,3...T)内是均匀分布的，即本发明需要让一个周期内他们围成的面积最小，即：Among them, the pre-activation value z=W·a′ _l-1 is within one cycle [(t-1)V _th /T, tV _th /T] (t=1, 2, 3...T) in FIG. 1 are uniformly distributed, that is, the present invention needs to minimize the area they enclose in one cycle, that is:

(S5)设置一个期望仿真时间T，按照公式(12)，脉冲神经网络snn中的每一层的偏差bias增加偏移量

/2T，使得转换误差最小化。(S5) Set an expected simulation time T, according to formula (12), the bias bias of each layer in the spiking neural network snn increases the offset

/2T, which minimizes the conversion error.

所以由求导得出当偏移V_th/2T时转换误差最小(图1中加粗直线)，此时转换误差为：Therefore, the derivation shows that the conversion error is the smallest when the offset V _th /2T (the bold line in Figure 1), and the conversion error is:

通过理论的推导，得到了最小转换误差的表达式，最小误差与阈值V_th的平方成正比，与仿真时间T成反比，这与以前工作得到的规律：越大的仿真时间以及越小的人工神经网络激活值使得转换后的脉冲神经网络表现越好的结果一致。Through theoretical derivation, the expression of the minimum conversion error is obtained. The minimum error is proportional to the square of the threshold V _th and inversely proportional to the simulation time T, which is consistent with the law obtained in previous work: the larger the simulation time, the smaller the artificial The neural network activation values are consistent with the result that the transformed spiking neural network performs better.

为了验证上述本申请的转换方法，申请人进行了仿真验证，具体仿真步骤如下：In order to verify the above-mentioned conversion method of the present application, the applicant has carried out simulation verification, and the specific simulation steps are as follows:

(A1)转换snn的输入为图片像素值或者是使用泊松脉冲等方法将图片脉冲化后的脉冲序列。(A1) Convert the input of snn to a picture pixel value or a pulse sequence after the picture is pulsed by methods such as Poisson pulses.

(A2)前向过程(公式(1))，在每一个时刻t，第l层的网络模型都会接收一个脉冲输入矩阵，它每一个位置上的值只可能是0或者

输入通过当前层网络模型计算出一个输出值a′_l，膜电位矩阵M_l增加a′_l，然后判断M_l中有哪些位置超过了阈值电位

超过的部分会减小

存放下来用于下一是时刻，并且发放一个大小为

的脉冲。最后记录输出层输出的脉冲矩阵O_t。(A2) Forward process (formula (1)), at each time t, the network model of the lth layer will receive an impulse input matrix, and the value at each position of it can only be 0 or

The input calculates an output value a' _l through the current layer network model, the membrane potential matrix M _l increases by a' _l , and then determines which positions in M _l exceed the threshold potential

The excess part will be reduced

Store it for the next time, and issue a size of

pulse. Finally, the pulse matrix O _t output by the output layer is recorded.

(A3)将步骤(A2)循环T次，对所有时刻的输出脉冲矩阵O_t累加，并且除以T，则我们可以得到snn的输出频率，即公式(5)的结果，该输出频率非常接近源ann的输出(公式(6))，所以可以执行ann相同的任务。(A3) Repeat step (A2) T times, accumulate the output pulse matrix O _t at all times, and divide by T, then we can get the output frequency of snn, which is the result of formula (5), which is very close to The output of the source ann (Equation (6)), so the same task as ann can be performed.

(A4)申请人通过对VGG16和ResNet20等经典网络进行验证，在数据集Cifar10、Cifar100和ImageNet上都能仅仅使用400左右的仿真长度取得与源ann几乎相同的性能，远远小于目前已知的其他方法。(A4) By verifying the classic networks such as VGG16 and ResNet20, the applicant can achieve almost the same performance as the source ann using only about 400 simulation lengths on the datasets Cifar10, Cifar100 and ImageNet, which is far smaller than the currently known Other methods.

通过以上操作，本发明得到了一个任务表现与源人工神经网络非常接近的脉冲神经网络，将脉冲神经网络移值到相应的硬件平台上，可以准确快速且低能耗地完成相应的任务。在本发明中，关键的点是步骤(S1)中的阈值操作和步骤(S5)中的偏移操作。如果没有执行阈值操作会使得网络每一层的最大输出值差异非常大，导致转换后的脉冲神经网络阈值过大，而产生非常多的神经源无法激活的现象；偏移操作能够让最大转换误差期望下降一半，在实际中能够减小相同输入条件下目标脉冲神经网络和源人工神经网络的输出差距。Through the above operations, the present invention obtains a spiking neural network whose task performance is very close to the source artificial neural network. By shifting the spiking neural network to a corresponding hardware platform, the corresponding tasks can be completed accurately, quickly and with low energy consumption. In the present invention, the key points are the threshold operation in step (S1) and the offset operation in step (S5). If the threshold operation is not performed, the maximum output value of each layer of the network will vary greatly, resulting in an excessively large threshold of the converted spiking neural network, resulting in the phenomenon that many neural sources cannot be activated; the offset operation can make the maximum conversion error The expectation is reduced by half, which can actually reduce the output gap between the target spiking neural network and the source artificial neural network under the same input conditions.

上述实施例仅为本发明的优选实施例，并非对本发明保护范围的限制，但凡采用本发明的设计原理，以及在此基础上进行非创造性劳动而做出的变化，均应属于本发明的保护范围之内。The above-mentioned embodiments are only the preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any changes made by adopting the design principles of the present invention and non-creative work on this basis shall belong to the protection of the present invention. within the range.

本发明实施例中提到的参考文献具体如下：The references mentioned in the embodiment of the present invention are as follows:

1.Barbi M,Chillemi S,Di Garbo A等.具有噪声阈值的正弦强迫LIF模型中的随机共振[J].生物系统,2003,71(1-2):23-28.1. Barbi M, Chillemi S, Di Garbo A, et al. Stochastic resonance in a sinusoidal forced LIF model with noise threshold [J]. Biosystems, 2003, 71(1-2):23-28.

2.Han B,Srinivasan G,Roy K.RMP-SNN:残留膜电位神经元，可实现更深的高精度和低延迟的尖峰神经网络[C]//IEEE/CVF会议上的计算机视觉和模式识别会议论文集.2020:13558-13567.2. Han B, Srinivasan G, Roy K. RMP-SNN: Residual Membrane Potential Neurons for Deeper Spiking Neural Networks with High Accuracy and Low Latency [C] // Computer Vision and Pattern Recognition at IEEE/CVF Conference Conference Proceedings. 2020:13558-13567.

3.Botev A,Ritter H,Barber D.实用的深度学习高斯-牛顿优化[C]//第34届机器学习量国际会议论文集70.2017:557-565.3. Botev A, Ritter H, Barber D. Practical Deep Learning Gauss-Newton Optimization [C] // Proceedings of the 34th International Conference on Machine Learning Quantities 70.2017:557-565.

4.Sengupta A,Ye Y,Wang R,et al.深入了解尖峰神经网络：Vgg和残差架构[J].神经科学前沿,2019,13:95.4. Sengupta A, Ye Y, Wang R, et al. In-depth understanding of spiking neural network: Vgg and residual architecture [J]. Frontiers in Neuroscience, 2019, 13:95.

Claims

1. An optimal conversion method from an artificial neural network to a spiking neural network, comprising the steps of:

(S1) setting a source artificial neural network ann that meets the requirement, and setting an upper limit to the activation function;

(S2) training the source artificial neural network ann and recording the maximum value of each layer output of the neural network;

(S3) constructing a spiking neural network snn that is identical to the source artificial neural network ann for each layer of the network model, and removing the activation function for each layer;

(S4) copying the network weight and the deviation bias for each layer in the source artificial neural network ann into each layer of the spiking neural network snn, and recording the maximum output value a of each layer of the source artificial neural network ann_lAs thresholds for the corresponding layer of the spiking neural network snn

(S5) setting a desired simulation time T according to the formula

The deviation bias per layer in the impulse neural network snn increases the offset amount

So that the conversion error is minimized.

2. The method for transforming an artificial neural network into an impulse neural network as claimed in claim 1, wherein the requirement in the step (S1) includes three points, the first is that the activation function has an effect of removing a negative value portion; second, the source artificial neural network ann requires the use of an average pooling layer; third, batch normalization layers cannot be used on the source neural network ann.