[go: up one dir, main page]

WO1992017849A1 - Conception automatique de processeurs de signaux a l'aide de reseaux neuronaux - Google Patents

Conception automatique de processeurs de signaux a l'aide de reseaux neuronaux Download PDF

Info

Publication number
WO1992017849A1
WO1992017849A1 PCT/US1992/002796 US9202796W WO9217849A1 WO 1992017849 A1 WO1992017849 A1 WO 1992017849A1 US 9202796 W US9202796 W US 9202796W WO 9217849 A1 WO9217849 A1 WO 9217849A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
gain
node
neural network
learning rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US1992/002796
Other languages
English (en)
Inventor
Murali M. Menon
Eric J. Van Allen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Massachusetts Institute of Technology
Original Assignee
Massachusetts Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Massachusetts Institute of Technology filed Critical Massachusetts Institute of Technology
Publication of WO1992017849A1 publication Critical patent/WO1992017849A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning

Definitions

  • the invention relates to neural networks and methods of training neural networks.
  • An important problem in signal processing is the capability to discriminate between signals originating from many different measurements. The discrimination is arbitrary in that any difference in the target or environment can serve as the basis of separation.
  • the signal processing system must evaluate an available set of measurements and determine if the signals are separable at the operating signal-to-noise ratio (SNR) .
  • SNR operating signal-to-noise ratio
  • a filter or transform is applied to the raw signal to obtain a representation that contains an easily separable set of features.
  • the difficulty lies in designing a transformation that maps the raw signal into a more easily separable representation. In some cases, it is clear that a certain filtering operation is appropriate, though in general, the selection of a filter that maps the raw signal to a salient data representation is a heuristic process.
  • the signal processor would "learn" the mapping required to perform signal discrimination from a known set of separable measurements.
  • the relevant information is known to be in the frequency domain, and the separability can be enhanced by applying an FFT on the signal.
  • the appropriate mapping is unknown and some heuristic - 2 - application of known transformations in the literature must be attempted.
  • Neural networks offer the promise of a completely data-driven processor that automatically learns the required mapping by example.
  • the neural network based processor can simply be retrained to accommodate changes in the measurement and discrimination.
  • the neural network approach has a loose correspondence to biological nervous systems where each neuron receives input from potentially thousands of other neurons to form a nonlinear interconnected network. It is believed that these biological networks are capable of learning highly complex mappings.
  • a possible mechanism for learning in neural systems was proposed by Hebb (see, D. o. Hebb, "The Organization of Behavior,” New York, N.Y., John Wiley, 1949) and this led to an interest in computer modeling of networks of "neuron-like" elements.
  • a training algorithm for multi-layer networks was then developed that allowed any desired mapping to be approximated given enough nodes in the network.
  • This approach was later redeveloped as the BEP training algorithm and applied to many different problems in the framework of parallel distributed processing (see, D.E. Rumelhart, G.E. Hinton and R.J. Williams, "Learning Internal Representations by Error Propogation,” in D.E. Rumelhart and J.L. McClelland (eds.), Parallel Distributed Processing; Explorations in the Microstructure of Cognition. Vol. 1, Cambridge, MA, MIT Press (1986) ) .
  • a drawback with the BEP approach is that convergence is slow due to local minima problems and it requires thousands of presentations of the training set for large dimensional problems.
  • the invention relates to a neural network based signal recognition system that "learns" an appropriate transform for a given signal type and application.
  • the recognition system is based on a multi-layer Perceptron neural network trained with a highly efficient deterministic annealing algorithm that can be two to three orders of magnitude faster than the commonly used Backward Error Propagation (BEP) technique.
  • BEP Backward Error Propagation
  • the training algorithm is less susceptible to local minima problems.
  • the system is data driven in the sense that nodes are added until a specified level of performance is achieved, thereby making most efficient use of the available processing resources.
  • the invention features a method of training a neural network having an output layer and at least one middle layer including one or more internal nodes each of which is characterized by a node activation function having a gain.
  • the method includes the steps of setting the gain on at least some of the internal nodes equal to an initial gain value; training the multi-layer perceptron starting with the initial gain value; and changing the gain on at least some of the internal nodes during training, the gain change being in a direction which increases sensitivity of the multi ⁇ layer perceptron.
  • Preferred embodiments include the following features.
  • the neural network is a fully connected, multi-perceptron neural network which includes no more than one middle layer that has but a single node.
  • the training employs a gradient descent training procedure, in particular, a back error propagation training procedure.
  • the training is characterized by a learning rate and the method also includes the step of decreasing that learning rate during training while also changing the gain.
  • the method further includes the step of setting the gain of each output node to a fixed value before beginning any training.
  • the internal nodes are each characterized by a sigmoid-like activation function which has the following form:
  • the method further includes the step of computing an error for the neural network after the gain has reached a final gain, the error indicating how well the neural network has been trained. Also, the method includes the further steps of adding an additional node to one of the internal layer if the error exceeds a predetermined threshold; and after adding the additional node, retraining the neural network.
  • the training is supervised training using a training set made up of members for which corresponding desired outputs are known, and the said error is a measure of how far the desired outputs for the members of the training set are from actual outputs generated by applying the members of the training set to the neural network.
  • the error is computed in accordance with the following equation:
  • E ⁇ p ⁇ j I o*. - d P . I , where p is an index identifying a member of the training set; j is an index identifying an output node; ⁇ P . is an actual output of output node j for the member of the training set; and d p . is a desired output of output node j for the p* member of the training set.
  • the method further includes the steps of determining whether the training is converging; and if it is determined that the training is not converging, modifying the training by increasing the training rate so as to cause an instability in training to occur.
  • the method also includes the step of resuming training at a reduced training rate after training for a preselected period of time with the increased learning rate.
  • the invention features an apparatus for training a neural network having an output layer and at least one middle layer which includes one or more internal nodes each of which is characterized by a node activation function having a gain.
  • the apparatus includes means for setting the gain on at least some of the internal nodes equal to an initial gain value; means for training the multi-layer perceptron starting with the initial gain value; and means for changing the gain on at least some of the internal nodes during training, the gain change being in a direction which increases sensitivity of the multi ⁇ layer perceptron.
  • the internal nodes are each characterized by a sig oid-like activation function of the following form:
  • the apparatus also includes means for computing an error for the neural network after the gain has reached a final gain, the error indicating how well the neural network has been trained. It further includes means for adding an additional node to one of the internal layers if the error exceeds a predetermined threshold; and means for causing the training' eans to retrain the neural network after the additional node has been added.
  • One advantage of the invention is that it can find a solution for architectures which appear to be insufficient based upon previous training techniques. For many problems involving real sensor signals, the invention arrives at architectures requiring less than 10 to 15 internal (“hidden") nodes to achieve the desired signal discrimination. In addition, the invention enables one to train a neural network on a very limited part of data and still achieve good generalization to the remainder of the data in the set of data. Moreover, the performance of the training algorithm is not particularly dependent on the order in which the neural network is trained. Other advantages and features will become apparent from the following description of the preferred embodiment and from the claims.
  • Figs. 2a and 2b present a flow chart of the gradient descent gain annealing (GDGA) algorithm for training a multi-layer perceptron;
  • GDGA gradient descent gain annealing
  • Fig. 3 shows the testing performance of a MLP network as a function of the percent of the data set used for training
  • Fig. 4 shows the training performance of a single hidden node MLP network as a function of the percent of the data set used for training; and Fig. 5 is a comparison of the average testing performance of a MLP network trained on 1% of the data set.
  • a multi-layer perceptron (MLP) neural network 10 is made up of an input layer 12 of input nodes 14 followed by an internal "hidden" layer 16 of internal nodes 18 that are connected to an output layer 20 of output nodes 22.
  • MLP network 10 is a fully interconnected MLP network operating in a feed-forward mode.
  • each node in a given layer is connected to every node in the next higher layer, and conversely, every node at any level above input layer 12 receives input from every node on the next lower level.
  • the node labelled "A”, i.e., a representative node 14 of input layer 12, is connected to every node 18 in the next higher, internal layer 16 and the node labelled "B”, i.e., a representative node 18 of internal layer 16, is connected to every node 14 in the input layer 12.
  • the depicted MLP network has only a single hidden layer 16, it could have more than one hidden layer depending upon the complexity and type of problem being modeled.
  • the number of input nodes 14 which are actually used depends on the dimensionality of the signal which will be fed into MLP network 10. For example, if the input signal is an M-point FFT, it may be necessary to use M input nodes.
  • Each node in MLP network 10 is characterized by a particular node activation function f(x) and an offset ⁇ and each connection between node j in one layer and node k in the next lower level is characterized by a weight, w- ..
  • the activation functions for internal nodes 18 and output nodes 22 are sigmoid functions having the following form:
  • the output for node j on level 1 is as follows:
  • O j Cl) f[( ⁇ k O k (l-l).w kj ) + ⁇ ..(l)].
  • Eq. 5 In general, a modified BEP training procedure is used to train the MLP network 10. It is modified by starting the system at a small magnitude for the gain and annealing the system to a large gain. At each gain value the BEP algorithm is run to convergence. The gain is a variable which has the characteristic that changing it deforms the energy surface with respect to the other free parameters (i.e., the weights and offsets). At low gain values, the MLP network has a nearly flat energy landscape, and the search covers a large portion of the parameter space.
  • GDGA Gradient Descent Gain Annealing
  • the dynamic architecture MLP network has the ability to grow to accommodate the complexity of the problem and make efficient use of the available resources. The GDGA training procedure will now be described in greater detail.
  • the MLP network includes only a single hidden layer. However, it should be understood that the procedure applies to other MLP networks and other architectures, including those with multiple hidden layers.
  • the steps of a GDGA training algorithm 100 are presented in Figs. 2a-b.
  • Training algorithm 100 which implements a supervised training schedule, begins with the selection of a set of input signals for which the desired outputs are known, i.e., a training set (step 102).
  • MLP network 10 is initialized, which involves setting the gain of all of the output nodes to a low fixed quantity, e.g. -2 (step 104) .
  • This approach as compared to other prior art approaches to this problem, has the advantage of proceeding from simpler architectures to more complex ones based upon the demands of the problem rather than by starting with an architecture which is unnecessarily complex for the problem at hand and then trying to pare away unneeded nodes.
  • the weights and offsets of all of the internal and output nodes 18 and 22 and connections are set to some small random values (step 108) .
  • the values for the weights and offsets are selected by using the following algorithm: 2.0 * RANDOM - 1.0, where RANDOM is a random number generating function which produces a number between 0 and 1.
  • the output of RANDOM is scaled and shifted so as to yield a distribution of randomly generated numbers centered on zero, which thus introduces no bias into the initialization of MLP network 10.
  • algorithm 100 defines the ranges over which the gain ⁇ of internal nodes 18 and the learning rate for the subsequent training will be permitted to vary during the gain annealing process (step 110) . It then initializes the gain and learning rate to the initial values (step 112) .
  • ⁇ Ln t the initial gain value
  • 0 final/ the final gain value is set to -10.0
  • 7init/ the initial learning rate is set to 0.03
  • K final the final learning rate
  • Algorithm 100 also initializes an energy variable E k equal to some large number.
  • E k serves to keep track of the minimum energy which is achieved during the training procedure. Setting E. assures that the first computed energy for MLP network 10 will be smaller than the initial value of E k .
  • E old is a measure of the total error between all of the training set signals and the desired outputs for those signals (step 114) .
  • the expression for computing E Qld is as follows:
  • O 9 . is the actual output of output node j for the p th signal of the training set; and d p . is the desired output of output node j for the p th signal of the training set.
  • algorithm 100 After computing E ol(J , algorithm 100 begins training MLP network 10 to adjust the weights and offsets using a back error propagation procedure (BEP) such as is well known to those skilled in the art (step 118) .
  • BEP back error propagation procedure
  • the BEP training procedure is run in the mode in which the weights and offsets are adjusted for each signal pattern of the training set rather that for the entire training set at once.
  • one iteration of the BEP training procedure consists of a separate training for each of the members of the training set.
  • the BEP training continues through multiple iterations until either the desired convergence is achieved or the number of iterations exceeds some threshold amount, indicating that the procedure is not converging.
  • Algorithm 100 keeps track of the number of iterations which are performed for a given gain and learning rate to determine whether the training procedure becomes stuck and fails to converge.
  • algorithm 100 computes E n ⁇ w , the energy for MLP network 10 resulting from that iteration of training (step 120) .
  • E a ⁇ w is then compared to E k (step 122) . If it is smaller than ' E k , the value of E fe is set equal to E n ⁇ w and the weights, offset and gain for that new minimum are saved (step 124) .
  • algorithm 100 determines whether the number of iterations which have been performed during this loop of the BEP training procedure has exceeded 50 (step 126) . During -the initial iterations of the training, algorithm 100 will of course detect that the number of iterations does not exceed 50 and it will then determine whether the desired convergence toward a global solution is occurring (step 134) . Algorithm 100 performs the convergence test by comparing the relative difference between E n ⁇ w and E old to some threshold level. In particular, algorithm 100 computes the absolute value of (E 0ew - E Qld )/E n ⁇ w and checks whether it is greater than 0.001.
  • algorithm 100 sets the value of E o i d **° E n ⁇ w (step 135) moves onto the next iteration of the BEP training procedure (i.e., algorithm 100 branches back to step 118) .
  • the training procedure gets trapped in a local minimum which causes the value of E to oscillate from one iteration to the next, it may be necessary to force the system out of that local minimum.
  • the iteration count indicates when such a problem occurs by rising above 50 (see step 126) .
  • algorithm 100 detects that the iteration count has exceeded 50, it "kicks" the system by boosting the learning rate to a very high number, e.g. 0.75 (step 128) . After the learning rate has been increased to 0.75, algorithm 100 performs ten iterations of the BEP training procedure (step 130) . Forcing a high learning rate during BEP training causes the system to become unstable and thus dislodges it from the local minimum.
  • algorithm 100 After the tenth iteration, algorithm 100 jumps to the next higher gain and the next lower learning rate, and branches back to step 118 proceed with the BEP training with the new set of initial values for the state variables. It should be noted that in the described embodiment, algorithm 100 moves through the range of permissible gains and the range of permissible learning rates in a linear fashion, one jump at a time. Each step in gain is equal to ( - , i n i t ⁇ 0 f i na i*'/ 5 and eacn step in learning rate is equal to (7 init ⁇ 7 fina ⁇ )/ 5 - *-_ n addition, when algorithm 100 increases gain by one step, at the same time it also decreases the learning rate by one step.
  • step 134 if the relative change in the magnitude of the energy does not exceed the threshold value, algorithm 100 prepares to move onto the next higher gain level. First, it sets the value of E Qld to
  • Enew ( x step c 139) ' . Then, ' it checks ⁇ r to determine whether it has reached the maximum gain level allowed (step 138) . If ⁇ is less than _ ⁇ final , algorithm 100, algorithm 100 jumps to the next gain and learning rate (step 140) and then branches back to step 118 to repeat the above- described BEP training procedure.
  • algorithm 100 adds a third node and again branches back to step 108 to see what effect the third node yields (step 148) .
  • Algorithm 100 continues adding nodes until the resulting improvement in performance is no greater than 10%. At that point, algorithm selects the structure and values of the state variables which yielded the lowest energy and terminates.
  • the node function partitions different regions of the input space by constructing hyperplanes to approximate the region boundaries.
  • the input to each node given by Eq. 2, is a linear equation for a plane in multidimensional space. As more hidden layer nodes are used the actual boundaries are more closely approximated.
  • the "sigmoid" transfer function implements a sharp or fuzzy boundary based on a high or low magnitude of the gain term in Eq. 1. An important characteristic of the sigmoid function is that it acts globally across the input space, thereby allowing the possibility of forming a compact representation of the salient features in the training data.
  • An MLP network trained with the GDGA algorithm was evaluated using actual radar signatures.
  • the task was to separate the radar signatures into two classes; object types A and B.
  • object types A and B The problem is difficult, because the effects of object geometry and measurement conditions on the radar signature are not well characterized. As a consequence the signatures are not easy to discriminate, and it is not clear which transformation will increase the separability.
  • a data set consisting of 3692 radar signatures (equal numbers of types A and B) was used in this study.
  • the training procedure consisted of initializing the network weights and offsets to a set of -sendom values and presenting a certain percent of the data ⁇ et in a random order to train the network. After training, the weights and offsets were fixed and the entire data set (3592 signatures) was used to test the network. The combination of training and then testing the network is defined as a trial.
  • a specific MLP network was evaluated by running 100 trials, where the weights and offsets are initialized to a different set of random values at each trial. Each trial also selected a different (random) training set. The performance was defined as the percent of the input patterns correctly classified, based on the distance between the network output and a set of outputs for the two signature types. The network had a single output node that has output values of 0.95 for type A and 0.05 for type B signatures. For each input pattern the target class with the minimum distance (Euclidean) to the network output was chosen as the pattern class. The trial with the maximum percent correct during testing is used for performance comparisons. The percent of the data set used for training and the number of nodes in the hidden layer were treated as independent parameters in the experiment. Fig.
  • Fig. 4 where the percent correct during training for a single node network is shown as a function of the percent of the data set used for training.
  • the limited capacity of a single hidden layer node is shown by the decrease in training set performance from 100% to 93% correct. This performance decrease occurred when the training set size was increased from 1% to 25% of the entire data set.
  • the testing performance is significantly increased by adding nodes to the hidden layer. After five nodes though, any further node addition only slightly improves the performance.
  • the ten node network could account for 95% of the data after training on 20% of the data and was able to attain 97% correct during testing as the training percentage is increased. Further investigation showed that the remaining 3% of the data that the network could not account for were actually bad measurements. Apparently the network was able to discriminate between signatures and also identify a non-signature without being explicitly trained as to what constitutes a bad measurement.
  • the network was also trained with the standard BEP algorithm and the testing results are shown in Fig. 5.
  • the generalization capability of the network trained with BEP was less than a network trained with the GDGA algorithm. This effect was most pronounced when training on a very small percent of the data set (a situation that is especially relevant to real world problems) .
  • the GDGA technique is able to explore the state space of the network more thoroughly at the low gain values than BEP training operating at a single gain.
  • the BEP algorithm was initialized to the optimum value of gain found by training with the GDGA algorithm.
  • the "history" of training at many different gain values was apparently significant to the generalization capability of the network.
  • Both the BEP and GDGA algorithms required approximately the same number of iterations for training, (approx.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Feedback Control In General (AREA)

Abstract

Procédé assurant l'apprentissage d'un réseau neuronal (1) possédant une couche de sortie (20) et au moins une couche intermédiaire (16) comprenant un ou plusieurs n÷uds internes (18) dont chacun est caractérisé par une fonction d'activation de n÷ud possédant un gain. Le procédé consiste à sélectionner le gain sur au moins une partie des n÷uds internes de sorte qu'il soit égal à une valeur initiale de gain; à faire l'apprentissage du perceptron multicouche à partir de la valeur du gain; et à modifier le gain sur au moins une partie des n÷uds internes pendant l'apprentissage, ladite modification du gain allant dans le sens d'une augmentation de la sensibilité du perceptron multicouche.
PCT/US1992/002796 1991-04-02 1992-04-01 Conception automatique de processeurs de signaux a l'aide de reseaux neuronaux Ceased WO1992017849A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US67922591A 1991-04-02 1991-04-02
US679,225 1991-04-02

Publications (1)

Publication Number Publication Date
WO1992017849A1 true WO1992017849A1 (fr) 1992-10-15

Family

ID=24726072

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1992/002796 Ceased WO1992017849A1 (fr) 1991-04-02 1992-04-01 Conception automatique de processeurs de signaux a l'aide de reseaux neuronaux

Country Status (1)

Country Link
WO (1) WO1992017849A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001020364A1 (fr) * 1999-09-10 2001-03-22 Henning Trappe Procede de donnees de mesure sismiques au moyen d'un reseau neuronal

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5033006A (en) * 1989-03-13 1991-07-16 Sharp Kabushiki Kaisha Self-extending neural-network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5033006A (en) * 1989-03-13 1991-07-16 Sharp Kabushiki Kaisha Self-extending neural-network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RUMELHART et al., "Learning Internal Representations by Error Propagation", PARALLEL DISTRIBUTED PROCESSING, Volume 1, Foundations, MIT Press, 1986. *
VOGL et al., "Accelerating the Convergence of Back Progation Method", BIOLOGICAL CYBERNETICS, SPRINGER-VERLOG, 1988, Page 250, 259, 260. *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001020364A1 (fr) * 1999-09-10 2001-03-22 Henning Trappe Procede de donnees de mesure sismiques au moyen d'un reseau neuronal
US6725163B1 (en) 1999-09-10 2004-04-20 Henning Trappe Method for processing seismic measured data with a neuronal network

Similar Documents

Publication Publication Date Title
Pal et al. Multilayer perceptron, fuzzy sets, and classification
US6167390A (en) Facet classification neural network
Sutton et al. Online learning with random representations.
Murray et al. Synaptic weight noise during multilayer perceptron training: fault tolerance and training improvements
Maclin et al. Combining the predictions of multiple classifiers: Using competitive learning to initialize neural networks
Denoeux et al. Initializing back propagation networks with prototypes
Billings et al. The determination of multivariable nonlinear models for dynamic systems using neural networks
US5943661A (en) Hybrid neural network classifier, systems and methods
Yoon et al. Training algorithm with incomplete data for feed-forward neural networks
US5469530A (en) Unsupervised training method for a neural net and a neural net classifier device
US6965885B2 (en) Self-organizing feature map with improved performance by non-monotonic variation of the learning rate
Du et al. Multilayer perceptrons: architecture and error backpropagation
Lee et al. A two-stage neural network approach for ARMA model identification with ESACF
WO1992017849A1 (fr) Conception automatique de processeurs de signaux a l'aide de reseaux neuronaux
Moreno et al. Efficient adaptive learning for classification tasks with binary units
Kia et al. Unsupervised clustering and centroid estimation using dynamic competitive learning
Taheri et al. Artificial neural networks
WO1991002322A1 (fr) Reseau neural de propagation de configurations
Karouia et al. Performance analysis of a MLP weight initialization algorithm.
Wann et al. Clustering with unsupervised learning neural networks: a comparative study
Hartono et al. Adaptive neural network ensemble that learns from imperfect supervisor
Owens et al. A multi-output-layer perceptron
Kim et al. Pattern classification of vibration signatures using unsupervised artificial neural network
de Paula Canuto Combining neural networks and fuzzy logic for applications in character recognition
Villalobos et al. Learning Evaluation and Pruning Techniques

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU MC NL SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA