CN109816103A

CN109816103A - A PSO-BFGS Neural Network Training Algorithm

Info

Publication number: CN109816103A
Application number: CN201711158655.1A
Authority: CN
Inventors: 李佳峻; 刘强
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2017-11-20
Filing date: 2017-11-20
Publication date: 2019-05-28

Abstract

The invention discloses a PSO-BFGS neural network training algorithm, comprising the following steps (1) dividing tasks: (2) PSO algorithm is responsible for global search, after iteration, the generated optimal solution is used as the input of BFGS, and then the BFGS algorithm is used to do Local fine search; (3) Parallelism division of PSO algorithm and BFGS quasi-Newton algorithm; (4) Neural network training error evaluation function implemented by PSO algorithm and BFGS algorithm; (5) Parallel implementation of PSO algorithm; (6) Parallel implementation BFGS quasi-Newton algorithm; (7) Parallel reduction. The algorithm is a parallel algorithm for neural network training based on particle swarm algorithm (PSO) and BFGS quasi-Newton algorithm. It uses GPU as the computing device for neural network training. Compared with other optimization algorithms, it has higher convergence efficiency and global search ability. Using OpenCL as a programming language has higher portability compared to the implementation using CUDA language, and can be used on GPUs and FPGAs from different manufacturers.

Description

A kind of PSO-BFGS neural network BP training algorithm

Technical field

The invention belongs to high-performance calculations and machine learning field, and in particular to a kind of PSO-BFGS neural metwork training calculation Method.

Background technique

Artificial neural network is a kind of information processing system, it can learn any input and output by volume of data and close System, and establish accurate model.At present, one of the significant challenge that artificial neural network faces is exactly to train.Before training, refreshing Any information is not carried through network；By training, the weighted value of neural network is determined, to establish essence based on training data True model.The determination process of neural network weighted value, is an optimization process, i.e. neural network is passed through by various optimization algorithms It crosses to iterate and more accurate fitting weighted value is calculated.The optimization algorithm of traditional neural metwork training is all mainly one Method, such as gradient descent method, Quasi-Newton algorithm (QN) and conjugate gradient algorithms (CG) etc. a bit based on gradient decline, these Convergence speed of the algorithm quickly, but tends to the case where falling into local convergence.In order to solve this problem, some scholars draw Enter the stronger PSO algorithm of ability of searching optimum to train neural network, but not only time-consuming huge but also convergence is not so good as PSO algorithm Method based on gradient decline.Therefore, in order to balance ability of searching optimum and convergence rate, someone intends PSO algorithm with BFGS Newton's algorithm combines.However, neural network training process becomes more time-consuming with the promotion of algorithm complexity.

With the fast development of GPU technology, current GPU has been provided with very strong computation capability and data processing energy Power, but also there is a big difference compared with CPU for the logic processing capability of GPU, and therefore, urgent need comprehensively considers speed promotion, method expands Malleability and design cost.

Summary of the invention

Purpose of the invention is to overcome the shortcomings in the prior art, provides a kind of PSO-BFGS neural metwork training calculation Method, namely it is based on the neural metwork training parallel algorithm of particle swarm algorithm (PSO) and BFGS Quasi-Newton algorithm, using GPU conduct Neural metwork training calculate equipment, it is limited solve the problems, such as traditional artificial neural network in the training process local convergence while Take into account convergence rate.

The purpose of the present invention is what is be achieved through the following technical solutions:

A kind of PSO-BFGS neural network BP training algorithm, which comprises the following steps:

(1) divide task: the PSO-BFGS neural network BP training algorithm includes calculating task and control task, and control is appointed Business is completed by CPU；Calculating task is completed by GPU, including PSO algorithm, BFGS Quasi-Newton algorithm, and the PSO algorithm is divided into four Kernel function (kernel), the BFGS algorithm are divided into five kernel functions (kernel)；

(2) based on PSO algorithm, auxiliary of the BFGS algorithm as PSO algorithm, the PSO algorithm is responsible for global search, By iteration, input of the optimal solution of generation as BFGS does local fine search with BFGS algorithm later；PSO algorithm and The objective function of BFGS algorithm is neural metwork training error evaluation function；

(3) degree of parallelism division is carried out to PSO algorithm and BFGS Quasi-Newton algorithm；

(4) realize that neural metwork training error evaluation function passes through in PSO algorithm by PSO algorithm and BFGS algorithm Whole training datas calculates each work-item, and in BFGS Quasi-Newton algorithm, each work-item calculates one group of training The value of the corresponding output neuron of data；

(5) Parallel Implementation PSO algorithm；

(6) Parallel Implementation BFGS Quasi-Newton algorithm；

(7) parallel reduction.

The control task includes primary data by host side to the transmission control for calculating equipment end, calculated result is by calculating Equipment end to host side transmitting between PSO and BFGS algorithm of transmission, data, whether reach the condition of the number of iterations upper limit Whether judgement, the whether satisfactory judgement of training error and control calculating task terminate.

Step (5) specifically includes the following steps:

1) generate random number: kernel function is first given birth to linear congruential random number production method Parallel Implementation by host stopping pregnancy The initial random number seed simultaneously incoming end GPU generates random number by generation as initial random number seed, every generation it is random several Son is the random number of previous generation, in Parallel Implementation, random number needed for each work-item generates one group of connection weight；

2) update particle rapidity and position: kernel function call 1) in random number and global optimum's particle position and Body history optimal location updates particle rapidity, updates particle position using updated particle rapidity and previous particle position Coordinate；

3) neural metwork training error is calculated；

4) global optimum and particle history optimal location are updated: by comparing last history optimal location and this appearance New particle position corresponding to neural metwork training error size, small person is set to particle history optimal location coordinate, often One thread calculates the history optimal location of one group of weight, generates global optimum position by parallel reduction.

Step (6) specifically includes the following steps:

1) neural metwork training error is calculated；

2) calculate the direction of search: by Hessian matrix H and gradient g, each work-item reads a line number of matrix H According to an element for multiply accumulating with gradient g operation generation direction of search dir；

3) Fibonacci method seeks step-length: initializing a step-length section first and calculates two golden sections in this section of section Point, then w is updated for step-length with the two points, then call E_TFunction seeks the corresponding value of two groups of w respectively and compares size, leave compared with Small step-length simultaneously deletes one section of section on the outside of larger step size；Repeatedly, until less than one, remaining step-length section definite value, Stop iteration and determines step-length；

4) it calculates neural metwork training error function gradient: successively calculating the partial derivative of each element in every group of weight w, after The sum that each element partial derivative is sought using the method for parallel reduction obtains neural metwork training error function gradient g；

5) Hessian matrix H is calculated: by s 3) is calculated by formula (5) (6) with the weight w and gradient g 4) being calculated And z, then x is calculated according to formula (7) by the two vectors, wherein s is the knots modification of weight, and z is the knots modification of gradient, x For correction term.Finally Hessian matrix is calculated according to formula (8).Hessian matrix H mono- is calculated in each work-item in the part The element of column.H_kFor the Hessian matrix H of kth time iteration.

s_k=w_k+1-w_k (5)

z_k=g_k+1-g_k (6)

Compared with prior art, the beneficial effects brought by the technical solution of the present invention are as follows:

1) present invention uses GPU as equipment is calculated, and the speed of service obtains huge compared with traditional realization based on CPU It is promoted.

2) present invention uses OpenCL to realize as programming language, with higher compared with the realization for using CUDA language Portability can use on the GPU and FPGA of different manufacturers.

3) optimization of the present invention using the hybrid algorithm of PSO algorithm and BFGS Quasi-Newton algorithm as neural metwork training is calculated Method has both higher convergence efficiency and ability of searching optimum compared with other optimization algorithms.

Detailed description of the invention

Fig. 1 is neural network structure schematic diagram.

Fig. 2 is parallel reduction schematic diagram.

Fig. 3 is parallel PSO-BFGS neural network BP training algorithm design flow diagram.

Fig. 4 is concrete operations flow chart.

Fig. 5 is the result schematic diagram of specific embodiment.

Specific embodiment

The invention will be further described with reference to the accompanying drawing.

As shown in Figure 1, being neural network structure schematic diagram, comprising: input layer, hidden layer and output layer.It is adjacent refreshing twice It all links together through first, the corresponding weight of each connection.Input layer number corresponds to the defeated of one group of training data Enter number, the output number of the corresponding one group of training data of output layer neuron number, hidden layer neuron number is set as needed Determine, generally higher than input layer quantity.

Fig. 3 is the design flow diagram of inventive algorithm, specific as follows:

1. task divide: PSO-BFGS neural network BP training algorithm totally includes two generic tasks, respectively calculating task and Control task.Control task is completed by CPU, including primary data by host side to transmission control, the calculating for calculating equipment end Whether the as a result transmitting by the transmission of calculating equipment end to host side, data PSO and BFGS algorithm reaches the number of iterations Whether the condition judgement of the upper limit, the whether satisfactory judgement of training error and control calculating task terminate.Calculating task by GPU includes PSO algorithm, BFGS Quasi-Newton algorithm to complete.PSO algorithm is divided into four kernel function kernel and realizes and BFGS calculation Method is divided into five kernel function kernel and realizes.

2. algorithm structure: PSO-BFGS neural network BP training algorithm based on PSO algorithm, calculate as PSO by BFGS algorithm The auxiliary of method, PSO algorithm are mainly responsible for global search, by certain number of iterations, input of the optimal solution of generation as BFGS, Then local fine search is done with BFGS algorithm, and the neural metwork training error evaluation function is as PSO algorithm and BFGS The objective function of algorithm.Its detailed process is as shown in Figure 3.The present invention carries out overall tasks according to the characteristics of CPU and GPU It divides.Wherein the initialization section of judgment part and data is all completed by CPU, and PSO algorithm and BFGS Quasi-Newton algorithm are all It is realized on GPU.The division of modules degree of parallelism, the characteristics of both having considered algorithm itself it is contemplated that GPU design feature.

3. degree of parallelism divides: for two parts in step 1, on GPU Parallel Implementation firstly the need of dividing degree of parallelism, Complete work-item required for these calculating tasks, work-item is work item, sum and by these work- The quantity for the work-group that item is organized into.The degree of parallelism of PSO algorithm, which divides, mainly to be divided according to the particle of PSO algorithm, Four kernel divide degree of parallelism according to identity principle, such as PSO algorithm includes 1024 particles, then its five kernel 1024 work-item of being required to participate in operations, each work-item carries out the relevant calculation of a particle.And BFGS is quasi- The calculating of objective function and its gradient divides degree of parallelism according to training data scale in five kernel of Newton's algorithm, such as such as Fruit training data is by 1024 groups, then the two kernel need 1024 work-item to participate in calculating, each work-item The relevant calculation of one group of training data is carried out, other kernel divide degree of parallelism according to neural network weight quantity, as shown in Figure 1 The connection weight quantity of neural network is 32, then work- needed for these three kernel under this neural network structure Item quantity is 32.

4. neural metwork training error evaluation function: although neural metwork training error function intends ox as PSO and BFGS The common objective function of algorithm, but the degree of parallelism of two algorithms divides not identical, and in order to improve operational efficiency, which exists There is different implementation methods in PSO algorithm and BFGS algorithm.

1) in PSO algorithm, since each particle respectively corresponds one group of weight, and correspondence will be calculated for every group of weight Training error, therefore each work-item calculates the corresponding error of one group of weight.In the algorithm, each work-item is needed Whole training datas is used to be calculated.

2) in BFGS Quasi-Newton algorithm, which need to only calculate the corresponding error of one group of weight, therefore each work- Item calculates the value of the corresponding output neuron of one group of training data, calculates training error finally by parallel reduction technology.It should Kernel reads training data from memory and connection weight w is calculated, and finally obtains the corresponding training error E of w_T(w).It should In function, shown in the calculating content such as formula (1) (2) (3) of s-th of work-item, wherein the value of input neuron is to train Data use x_i ^sIt indicates, indicates the value of s group training data corresponding with i-th of input neuron；The value of hidden neuron by Formula (1) and (2) are calculated, wherein f (h_j) indicate j-th of hidden neuron value, w_ijIndicate that i-th of input neuron arrives The weight of j-th of hidden neuron, n indicate input neuronal quantity；Output neuron y_mValue be calculated by formula (3), Wherein m indicates that m-th of output neuron, N indicate the quantity of hidden neuron.Finally, being asked using reduction techniques according to formula (4) Obtain neural metwork training error E_T.S in formula (4)_TIndicate the scale of training dataset, N_yIndicate output neuron number, y_mTable Show the calculated result of m-th of output neuron, d_kmIndicate that kth group training data is corresponding with m-th of output neuron ideal Export result.d_max,mMaximum idea output in the given training data of expression, and d_min,mIt indicates in given training data most Small idea output.Each work-item need to only use one group of training data and be calculated in the function.

5.PSO Parallel algorithm:

1) generation of random number: kernel function is first produced linear congruential random number production method Parallel Implementation by the end host Raw initial random number seed and the incoming end GPU generate random number by generation as initial random number seed.The random number of every generation Seed is the random number of previous generation.In Parallel Implementation, each work-item generates random needed for one group of connection weight Number.

2) update particle rapidity and position: kernel function call 1) in random number and global optimum's particle position and Body history optimal location updates particle rapidity, reuses updated particle rapidity and previous particle position more new particle position Set coordinate.Per thread updates the speed and location parameter of one group of connection weight.

3) neural metwork training error is calculated: referring to 1) described in step 4.

4) update local optimum and particle history optimal location: the generation of the history optimal location of particle is recursive mistake Journey, the generation of history optimal location only need corresponding to more last history optimal location and this new particle position occurred Neural metwork training error size, small person is set to particle history optimal location coordinate.Each thread calculates one group of weight History optimal location.And the generation of global optimum position needs to use parallel reduction technology.Its detailed process such as Fig. 2 institute Show.

6.BFGS Quasi-Newton algorithm Parallel Implementation:

1) neural metwork training error is calculated: 2) described in specific calculating process such as step 4.

2) calculate the direction of search: its calculating process needs to use Hessian matrix H and gradient g, each work-item is read The data line and gradient g of matrix H carry out multiplying accumulating the element that operation generates direction of search dir.

3) Fibonacci method seeks step-length: its process is to initialize a step-length section first and calculate two of this section of section Golden section point, then w is updated for step-length with the two points, then call E_TFunction seeks the corresponding value of two groups of w and bigger respectively It is small, it leaves that lesser step-length and deletes one section of section on the outside of larger step size.Repeatedly, until remaining step-length section Less than one definite value stops iteration and determines step-length.Since the kernel main purpose is to update weight w, inside function The division of degree of parallelism is carried out based on the dimension of w.I.e. each work-item calculates an element of weight w.The function will be repeatedly Neural metwork training error evaluation function is called, that is, is needed in kernel intrinsic call another kernel.The function can To be realized by the OpenCL for being higher than 1.1 versions.But since the degree of parallelism of two kernel is not identical, thus while overall Work-item quantity is identical, but active work-item is not identical.In order to avoid thus bring work-item conflicts, Before calling training error function every time, all work-item are forced to synchronize operation.

4) the gradient g: its process for calculating neural metwork training error evaluation function is predominantly successively calculated in every group of weight w Then the partial derivative of each element is sought the sum of each element partial derivative using the method for parallel reduction, neural network can be obtained Training error functional gradient.In the function, each work-item calculates all elements in the weight w based on one group of training data Partial derivative, finally to all work-item carry out reduction summation.

3) and 4) 5) calculate Hessian matrix H: subscript k indicates kth time iteration in all formula, by the weight being calculated S and z is calculated by formula (5) (6) in w and gradient g, then x is calculated according to formula (7) by the two vectors, and wherein s is The knots modification of weight, z are the knots modification of gradient, and x is correction term.Finally Hessian matrix is calculated according to formula (8).It is every in the part The element of the column of Hessian matrix H mono- is calculated in a work-item.H_kFor the Hessian matrix H of kth time iteration.

s_k=w_k+1-w_k (5)

z_k=g_k+1-g_k (6)

7. parallel reduction: needing to use the technology of parallel reduction in PSO-BFGS neural metwork training parallel algorithm to locate Manage the operation of some summations and searching minimum value.If Fig. 2 is the operation for finding minimum value using parallel reduction, first work- Item compares the size of first variable and the last one variable and replaces biggish number, second work- with lesser number Item compares second variable and penultimate variable size, and so on, final result is calculated by first work-item It obtains.

Operating process in specific implementation process is as shown in Figure 4:

(1) neural network and particle group parameters setting

As shown in Figure 1, being the structure chart of the neural network；The present embodiment has selected single hidden layer neural network, input Layer neuronal quantity is arranged according to the input variable number for the model to be fitted, and output layer neuron quantity is according to being fitted Model the setting of output variable number, hidden layer neuron quantity is arranged according to their needs, generally higher than input layer nerve First quantity.The quantity of neural network weight quantity neuron according to set by front is according to formula w=(input+output) * Hidden voluntarily calculates change.And particle group parameters mainly have particle scale, dimensionality of particle, particle rapidity and location boundary.Grain The parameter of sub- dimension is identical as neural network weight number, and after setting neural network structure, system can be arranged automatically；Particle rule Mould refers to the number of particles for needing to use in PSO algorithm, can self-setting according to demand；Particle rapidity and location boundary can also bases Need self-setting.

(2) end GPU parameter setting

The end GPU parameter setting mainly has the setting of work-item quantity and the setting of work-group scale.work- Item is the minimum unit run on GPU, and the setting of quantity can be set according to training data scale and the greater in particle scale Depending on setting.Such as training data scale be 4096 groups, particle scale be 1024, then work-item quantity take it is larger in the two Be set as 4096.A certain number of work-item can organize as work-group, in this way convenient for between group work-item Data transmission and management.Work-item quantity can be arranged according to neural network weight quantity in work-group, such as Fig. 1 institute Show neural network, weight quantity is (3+1) * 8=32, then the quantity of work-item is just set in a work-group It is 32.

(3) training data imports

Software can only read the file of csv format, so needing to be first stored in training data in csv file, then will It is stored under software catalog, and corresponding filename is changed to training-data.

(4) termination condition is set

After the completion of above step, it is also necessary to set software termination condition, generally comprise the setting of iteration maximum times and instruction Practice error boundary condition setting.After being provided with, software reaches maximum number of iterations or training error is less than training error side Boundary's condition then terminator and can export result.

(5) result records

It is as shown in Figure 5 to export result.It as a result include four information, the first behavior training error indicates neural network fitting The accuracy of model, as shown is 29.4891；Second row is the number of iterations, and representation program is secondary from the iteration for running to termination Number, as shown is 20；The neural network weight that the third line starts as acquisition；Last line is runing time, and unit is the second.

The present invention is not limited to embodiments described above.Above the description of specific embodiment is intended to describe and say Bright technical solution of the present invention, the above mentioned embodiment is only schematical, is not restrictive.This is not being departed from In the case of invention objective and scope of the claimed protection, those skilled in the art may be used also under the inspiration of the present invention The specific transformation of many forms is made, within these are all belonged to the scope of protection of the present invention.

Claims

1. a kind of PSO-BFGS neural network BP training algorithm, which comprises the following steps:

(1) divide task: the PSO-BFGS neural network BP training algorithm includes calculating task and control task, control task by CPU is completed；Calculating task is completed by GPU, including PSO algorithm, BFGS Quasi-Newton algorithm, and the PSO algorithm is divided into four kernels Function kernel, the BFGS algorithm are divided into five kernel function kernel；

(2) based on PSO algorithm, auxiliary of the BFGS algorithm as PSO algorithm, the PSO algorithm is responsible for global search, passes through Iteration, input of the optimal solution of generation as BFGS, does local fine search with BFGS algorithm later；PSO algorithm and BFGS are calculated The objective function of method is neural metwork training error evaluation function；

(4) realize that neural metwork training error evaluation function passes through whole in PSO algorithm by PSO algorithm and BFGS algorithm Training data calculate each work-item, in BFGS Quasi-Newton algorithm, each work-item calculates one group of training data The value of corresponding output neuron；

(5) Parallel Implementation PSO algorithm；

(6) Parallel Implementation BFGS Quasi-Newton algorithm；

(7) parallel reduction.

2. a kind of PSO-BFGS neural network BP training algorithm according to claim 1, which is characterized in that the control task packet Include primary data by host side to calculate equipment end transmission control, calculated result by calculating equipment end to host side transmission, Whether transmitting of the data between PSO and BFGS algorithm, the condition judgement for whether reaching the number of iterations upper limit, training error meet It is required that judgement and control calculating task whether terminate.

3. a kind of PSO-BFGS neural network BP training algorithm according to claim 1, which is characterized in that step (5) is specifically wrapped Include following steps:

1) generate random number: kernel function is first raw initial by host stopping pregnancy by linear congruential random number production method Parallel Implementation Random number seed and the incoming end GPU generate random number by generation as initial random number seed, and the random number seed of every generation is equal For the random number of previous generation, in Parallel Implementation, random number needed for each work-item generates one group of connection weight；

2) update particle rapidity and position: kernel function call 1) in random number and global optimum's particle position and individual go through History optimal location updates particle rapidity, updates particle position using updated particle rapidity and previous particle position and sits Mark；

3) neural metwork training error is calculated；

4) it updates global optimum and particle history optimal location: occurring by comparing last history optimal location with this new Particle position corresponding to neural metwork training error size, small person is set to particle history optimal location coordinate, each Thread calculates the history optimal location of one group of weight, generates global optimum position by parallel reduction.

4. a kind of PSO-BFGS neural network BP training algorithm according to claim 1, which is characterized in that step (6) is specifically wrapped Include following steps:

1) neural metwork training error is calculated；

2) calculate the direction of search: by Hessian matrix H and gradient g, each work-item read the data line of matrix H with Gradient g carries out multiplying accumulating the element that operation generates direction of search dir；

3) Fibonacci method seeks step-length: it initializes a step-length section first and calculates two golden section points in this section of section, It is again that step-length updates w with the two points, then calls E_TFunction seeks the corresponding value of two groups of w respectively and compares size, leaves smaller Step-length and delete one section of section on the outside of larger step size；Repeatedly, until less than one, remaining step-length section definite value, stop Only iteration and determining step-length；

4) it calculates neural metwork training error function gradient: successively calculating the partial derivative of each element in every group of weight w, it is rear to use The method of parallel reduction seeks the sum of each element partial derivative, obtains neural metwork training error function gradient g；

5) Hessian matrix H is calculated: by s and z 3) is calculated by formula (5) (6) with the weight w and gradient g 4) being calculated, X is calculated according to formula (7) by the two vectors again, finally calculates Hessian matrix according to formula (8):

s_k=w_k+1-w_k (5)

z_k=g_k+1-g_k (6)