A kind of PSO-BFGS neural network BP training algorithm
Technical field
The invention belongs to high-performance calculations and machine learning field, and in particular to a kind of PSO-BFGS neural metwork training calculation
Method.
Background technique
Artificial neural network is a kind of information processing system, it can learn any input and output by volume of data and close
System, and establish accurate model.At present, one of the significant challenge that artificial neural network faces is exactly to train.Before training, refreshing
Any information is not carried through network;By training, the weighted value of neural network is determined, to establish essence based on training data
True model.The determination process of neural network weighted value, is an optimization process, i.e. neural network is passed through by various optimization algorithms
It crosses to iterate and more accurate fitting weighted value is calculated.The optimization algorithm of traditional neural metwork training is all mainly one
Method, such as gradient descent method, Quasi-Newton algorithm (QN) and conjugate gradient algorithms (CG) etc. a bit based on gradient decline, these
Convergence speed of the algorithm quickly, but tends to the case where falling into local convergence.In order to solve this problem, some scholars draw
Enter the stronger PSO algorithm of ability of searching optimum to train neural network, but not only time-consuming huge but also convergence is not so good as PSO algorithm
Method based on gradient decline.Therefore, in order to balance ability of searching optimum and convergence rate, someone intends PSO algorithm with BFGS
Newton's algorithm combines.However, neural network training process becomes more time-consuming with the promotion of algorithm complexity.
With the fast development of GPU technology, current GPU has been provided with very strong computation capability and data processing energy
Power, but also there is a big difference compared with CPU for the logic processing capability of GPU, and therefore, urgent need comprehensively considers speed promotion, method expands
Malleability and design cost.
Summary of the invention
Purpose of the invention is to overcome the shortcomings in the prior art, provides a kind of PSO-BFGS neural metwork training calculation
Method, namely it is based on the neural metwork training parallel algorithm of particle swarm algorithm (PSO) and BFGS Quasi-Newton algorithm, using GPU conduct
Neural metwork training calculate equipment, it is limited solve the problems, such as traditional artificial neural network in the training process local convergence while
Take into account convergence rate.
The purpose of the present invention is what is be achieved through the following technical solutions:
A kind of PSO-BFGS neural network BP training algorithm, which comprises the following steps:
(1) divide task: the PSO-BFGS neural network BP training algorithm includes calculating task and control task, and control is appointed
Business is completed by CPU;Calculating task is completed by GPU, including PSO algorithm, BFGS Quasi-Newton algorithm, and the PSO algorithm is divided into four
Kernel function (kernel), the BFGS algorithm are divided into five kernel functions (kernel);
(2) based on PSO algorithm, auxiliary of the BFGS algorithm as PSO algorithm, the PSO algorithm is responsible for global search,
By iteration, input of the optimal solution of generation as BFGS does local fine search with BFGS algorithm later;PSO algorithm and
The objective function of BFGS algorithm is neural metwork training error evaluation function;
(3) degree of parallelism division is carried out to PSO algorithm and BFGS Quasi-Newton algorithm;
(4) realize that neural metwork training error evaluation function passes through in PSO algorithm by PSO algorithm and BFGS algorithm
Whole training datas calculates each work-item, and in BFGS Quasi-Newton algorithm, each work-item calculates one group of training
The value of the corresponding output neuron of data;
(5) Parallel Implementation PSO algorithm;
(6) Parallel Implementation BFGS Quasi-Newton algorithm;
(7) parallel reduction.
The control task includes primary data by host side to the transmission control for calculating equipment end, calculated result is by calculating
Equipment end to host side transmitting between PSO and BFGS algorithm of transmission, data, whether reach the condition of the number of iterations upper limit
Whether judgement, the whether satisfactory judgement of training error and control calculating task terminate.
Step (5) specifically includes the following steps:
1) generate random number: kernel function is first given birth to linear congruential random number production method Parallel Implementation by host stopping pregnancy
The initial random number seed simultaneously incoming end GPU generates random number by generation as initial random number seed, every generation it is random several
Son is the random number of previous generation, in Parallel Implementation, random number needed for each work-item generates one group of connection weight;
2) update particle rapidity and position: kernel function call 1) in random number and global optimum's particle position and
Body history optimal location updates particle rapidity, updates particle position using updated particle rapidity and previous particle position
Coordinate;
3) neural metwork training error is calculated;
4) global optimum and particle history optimal location are updated: by comparing last history optimal location and this appearance
New particle position corresponding to neural metwork training error size, small person is set to particle history optimal location coordinate, often
One thread calculates the history optimal location of one group of weight, generates global optimum position by parallel reduction.
Step (6) specifically includes the following steps:
1) neural metwork training error is calculated;
2) calculate the direction of search: by Hessian matrix H and gradient g, each work-item reads a line number of matrix H
According to an element for multiply accumulating with gradient g operation generation direction of search dir;
3) Fibonacci method seeks step-length: initializing a step-length section first and calculates two golden sections in this section of section
Point, then w is updated for step-length with the two points, then call ETFunction seeks the corresponding value of two groups of w respectively and compares size, leave compared with
Small step-length simultaneously deletes one section of section on the outside of larger step size;Repeatedly, until less than one, remaining step-length section definite value,
Stop iteration and determines step-length;
4) it calculates neural metwork training error function gradient: successively calculating the partial derivative of each element in every group of weight w, after
The sum that each element partial derivative is sought using the method for parallel reduction obtains neural metwork training error function gradient g;
5) Hessian matrix H is calculated: by s 3) is calculated by formula (5) (6) with the weight w and gradient g 4) being calculated
And z, then x is calculated according to formula (7) by the two vectors, wherein s is the knots modification of weight, and z is the knots modification of gradient, x
For correction term.Finally Hessian matrix is calculated according to formula (8).Hessian matrix H mono- is calculated in each work-item in the part
The element of column.HkFor the Hessian matrix H of kth time iteration.
sk=wk+1-wk (5)
zk=gk+1-gk (6)
Compared with prior art, the beneficial effects brought by the technical solution of the present invention are as follows:
1) present invention uses GPU as equipment is calculated, and the speed of service obtains huge compared with traditional realization based on CPU
It is promoted.
2) present invention uses OpenCL to realize as programming language, with higher compared with the realization for using CUDA language
Portability can use on the GPU and FPGA of different manufacturers.
3) optimization of the present invention using the hybrid algorithm of PSO algorithm and BFGS Quasi-Newton algorithm as neural metwork training is calculated
Method has both higher convergence efficiency and ability of searching optimum compared with other optimization algorithms.
Detailed description of the invention
Fig. 1 is neural network structure schematic diagram.
Fig. 2 is parallel reduction schematic diagram.
Fig. 3 is parallel PSO-BFGS neural network BP training algorithm design flow diagram.
Fig. 4 is concrete operations flow chart.
Fig. 5 is the result schematic diagram of specific embodiment.
Specific embodiment
The invention will be further described with reference to the accompanying drawing.
As shown in Figure 1, being neural network structure schematic diagram, comprising: input layer, hidden layer and output layer.It is adjacent refreshing twice
It all links together through first, the corresponding weight of each connection.Input layer number corresponds to the defeated of one group of training data
Enter number, the output number of the corresponding one group of training data of output layer neuron number, hidden layer neuron number is set as needed
Determine, generally higher than input layer quantity.
Fig. 3 is the design flow diagram of inventive algorithm, specific as follows:
1. task divide: PSO-BFGS neural network BP training algorithm totally includes two generic tasks, respectively calculating task and
Control task.Control task is completed by CPU, including primary data by host side to transmission control, the calculating for calculating equipment end
Whether the as a result transmitting by the transmission of calculating equipment end to host side, data PSO and BFGS algorithm reaches the number of iterations
Whether the condition judgement of the upper limit, the whether satisfactory judgement of training error and control calculating task terminate.Calculating task by
GPU includes PSO algorithm, BFGS Quasi-Newton algorithm to complete.PSO algorithm is divided into four kernel function kernel and realizes and BFGS calculation
Method is divided into five kernel function kernel and realizes.
2. algorithm structure: PSO-BFGS neural network BP training algorithm based on PSO algorithm, calculate as PSO by BFGS algorithm
The auxiliary of method, PSO algorithm are mainly responsible for global search, by certain number of iterations, input of the optimal solution of generation as BFGS,
Then local fine search is done with BFGS algorithm, and the neural metwork training error evaluation function is as PSO algorithm and BFGS
The objective function of algorithm.Its detailed process is as shown in Figure 3.The present invention carries out overall tasks according to the characteristics of CPU and GPU
It divides.Wherein the initialization section of judgment part and data is all completed by CPU, and PSO algorithm and BFGS Quasi-Newton algorithm are all
It is realized on GPU.The division of modules degree of parallelism, the characteristics of both having considered algorithm itself it is contemplated that GPU design feature.
3. degree of parallelism divides: for two parts in step 1, on GPU Parallel Implementation firstly the need of dividing degree of parallelism,
Complete work-item required for these calculating tasks, work-item is work item, sum and by these work-
The quantity for the work-group that item is organized into.The degree of parallelism of PSO algorithm, which divides, mainly to be divided according to the particle of PSO algorithm,
Four kernel divide degree of parallelism according to identity principle, such as PSO algorithm includes 1024 particles, then its five kernel
1024 work-item of being required to participate in operations, each work-item carries out the relevant calculation of a particle.And BFGS is quasi-
The calculating of objective function and its gradient divides degree of parallelism according to training data scale in five kernel of Newton's algorithm, such as such as
Fruit training data is by 1024 groups, then the two kernel need 1024 work-item to participate in calculating, each work-item
The relevant calculation of one group of training data is carried out, other kernel divide degree of parallelism according to neural network weight quantity, as shown in Figure 1
The connection weight quantity of neural network is 32, then work- needed for these three kernel under this neural network structure
Item quantity is 32.
4. neural metwork training error evaluation function: although neural metwork training error function intends ox as PSO and BFGS
The common objective function of algorithm, but the degree of parallelism of two algorithms divides not identical, and in order to improve operational efficiency, which exists
There is different implementation methods in PSO algorithm and BFGS algorithm.
1) in PSO algorithm, since each particle respectively corresponds one group of weight, and correspondence will be calculated for every group of weight
Training error, therefore each work-item calculates the corresponding error of one group of weight.In the algorithm, each work-item is needed
Whole training datas is used to be calculated.
2) in BFGS Quasi-Newton algorithm, which need to only calculate the corresponding error of one group of weight, therefore each work-
Item calculates the value of the corresponding output neuron of one group of training data, calculates training error finally by parallel reduction technology.It should
Kernel reads training data from memory and connection weight w is calculated, and finally obtains the corresponding training error E of wT(w).It should
In function, shown in the calculating content such as formula (1) (2) (3) of s-th of work-item, wherein the value of input neuron is to train
Data use xi sIt indicates, indicates the value of s group training data corresponding with i-th of input neuron;The value of hidden neuron by
Formula (1) and (2) are calculated, wherein f (hj) indicate j-th of hidden neuron value, wijIndicate that i-th of input neuron arrives
The weight of j-th of hidden neuron, n indicate input neuronal quantity;Output neuron ymValue be calculated by formula (3),
Wherein m indicates that m-th of output neuron, N indicate the quantity of hidden neuron.Finally, being asked using reduction techniques according to formula (4)
Obtain neural metwork training error ET.S in formula (4)TIndicate the scale of training dataset, NyIndicate output neuron number, ymTable
Show the calculated result of m-th of output neuron, dkmIndicate that kth group training data is corresponding with m-th of output neuron ideal
Export result.dmax,mMaximum idea output in the given training data of expression, and dmin,mIt indicates in given training data most
Small idea output.Each work-item need to only use one group of training data and be calculated in the function.
5.PSO Parallel algorithm:
1) generation of random number: kernel function is first produced linear congruential random number production method Parallel Implementation by the end host
Raw initial random number seed and the incoming end GPU generate random number by generation as initial random number seed.The random number of every generation
Seed is the random number of previous generation.In Parallel Implementation, each work-item generates random needed for one group of connection weight
Number.
2) update particle rapidity and position: kernel function call 1) in random number and global optimum's particle position and
Body history optimal location updates particle rapidity, reuses updated particle rapidity and previous particle position more new particle position
Set coordinate.Per thread updates the speed and location parameter of one group of connection weight.
3) neural metwork training error is calculated: referring to 1) described in step 4.
4) update local optimum and particle history optimal location: the generation of the history optimal location of particle is recursive mistake
Journey, the generation of history optimal location only need corresponding to more last history optimal location and this new particle position occurred
Neural metwork training error size, small person is set to particle history optimal location coordinate.Each thread calculates one group of weight
History optimal location.And the generation of global optimum position needs to use parallel reduction technology.Its detailed process such as Fig. 2 institute
Show.
6.BFGS Quasi-Newton algorithm Parallel Implementation:
1) neural metwork training error is calculated: 2) described in specific calculating process such as step 4.
2) calculate the direction of search: its calculating process needs to use Hessian matrix H and gradient g, each work-item is read
The data line and gradient g of matrix H carry out multiplying accumulating the element that operation generates direction of search dir.
3) Fibonacci method seeks step-length: its process is to initialize a step-length section first and calculate two of this section of section
Golden section point, then w is updated for step-length with the two points, then call ETFunction seeks the corresponding value of two groups of w and bigger respectively
It is small, it leaves that lesser step-length and deletes one section of section on the outside of larger step size.Repeatedly, until remaining step-length section
Less than one definite value stops iteration and determines step-length.Since the kernel main purpose is to update weight w, inside function
The division of degree of parallelism is carried out based on the dimension of w.I.e. each work-item calculates an element of weight w.The function will be repeatedly
Neural metwork training error evaluation function is called, that is, is needed in kernel intrinsic call another kernel.The function can
To be realized by the OpenCL for being higher than 1.1 versions.But since the degree of parallelism of two kernel is not identical, thus while overall
Work-item quantity is identical, but active work-item is not identical.In order to avoid thus bring work-item conflicts,
Before calling training error function every time, all work-item are forced to synchronize operation.
4) the gradient g: its process for calculating neural metwork training error evaluation function is predominantly successively calculated in every group of weight w
Then the partial derivative of each element is sought the sum of each element partial derivative using the method for parallel reduction, neural network can be obtained
Training error functional gradient.In the function, each work-item calculates all elements in the weight w based on one group of training data
Partial derivative, finally to all work-item carry out reduction summation.
3) and 4) 5) calculate Hessian matrix H: subscript k indicates kth time iteration in all formula, by the weight being calculated
S and z is calculated by formula (5) (6) in w and gradient g, then x is calculated according to formula (7) by the two vectors, and wherein s is
The knots modification of weight, z are the knots modification of gradient, and x is correction term.Finally Hessian matrix is calculated according to formula (8).It is every in the part
The element of the column of Hessian matrix H mono- is calculated in a work-item.HkFor the Hessian matrix H of kth time iteration.
sk=wk+1-wk (5)
zk=gk+1-gk (6)
7. parallel reduction: needing to use the technology of parallel reduction in PSO-BFGS neural metwork training parallel algorithm to locate
Manage the operation of some summations and searching minimum value.If Fig. 2 is the operation for finding minimum value using parallel reduction, first work-
Item compares the size of first variable and the last one variable and replaces biggish number, second work- with lesser number
Item compares second variable and penultimate variable size, and so on, final result is calculated by first work-item
It obtains.
Operating process in specific implementation process is as shown in Figure 4:
(1) neural network and particle group parameters setting
As shown in Figure 1, being the structure chart of the neural network;The present embodiment has selected single hidden layer neural network, input
Layer neuronal quantity is arranged according to the input variable number for the model to be fitted, and output layer neuron quantity is according to being fitted
Model the setting of output variable number, hidden layer neuron quantity is arranged according to their needs, generally higher than input layer nerve
First quantity.The quantity of neural network weight quantity neuron according to set by front is according to formula w=(input+output) *
Hidden voluntarily calculates change.And particle group parameters mainly have particle scale, dimensionality of particle, particle rapidity and location boundary.Grain
The parameter of sub- dimension is identical as neural network weight number, and after setting neural network structure, system can be arranged automatically;Particle rule
Mould refers to the number of particles for needing to use in PSO algorithm, can self-setting according to demand;Particle rapidity and location boundary can also bases
Need self-setting.
(2) end GPU parameter setting
The end GPU parameter setting mainly has the setting of work-item quantity and the setting of work-group scale.work-
Item is the minimum unit run on GPU, and the setting of quantity can be set according to training data scale and the greater in particle scale
Depending on setting.Such as training data scale be 4096 groups, particle scale be 1024, then work-item quantity take it is larger in the two
Be set as 4096.A certain number of work-item can organize as work-group, in this way convenient for between group work-item
Data transmission and management.Work-item quantity can be arranged according to neural network weight quantity in work-group, such as Fig. 1 institute
Show neural network, weight quantity is (3+1) * 8=32, then the quantity of work-item is just set in a work-group
It is 32.
(3) training data imports
Software can only read the file of csv format, so needing to be first stored in training data in csv file, then will
It is stored under software catalog, and corresponding filename is changed to training-data.
(4) termination condition is set
After the completion of above step, it is also necessary to set software termination condition, generally comprise the setting of iteration maximum times and instruction
Practice error boundary condition setting.After being provided with, software reaches maximum number of iterations or training error is less than training error side
Boundary's condition then terminator and can export result.
(5) result records
It is as shown in Figure 5 to export result.It as a result include four information, the first behavior training error indicates neural network fitting
The accuracy of model, as shown is 29.4891;Second row is the number of iterations, and representation program is secondary from the iteration for running to termination
Number, as shown is 20;The neural network weight that the third line starts as acquisition;Last line is runing time, and unit is the second.
The present invention is not limited to embodiments described above.Above the description of specific embodiment is intended to describe and say
Bright technical solution of the present invention, the above mentioned embodiment is only schematical, is not restrictive.This is not being departed from
In the case of invention objective and scope of the claimed protection, those skilled in the art may be used also under the inspiration of the present invention
The specific transformation of many forms is made, within these are all belonged to the scope of protection of the present invention.