DESCRIPTION
SYSTEM AND METHOD FOR IMPLEMENTING RESERVOIR COMPUTING USING CELLULAR AUTOMATA
Field of the invention
The present invention relates to a system and method for implementing a specific class of a recurrent neural network algorithm called reservoir computing using cellular automata.
Background of the invention
Recurrent Neural Networks (RNNs) are connectionist computational models that utilize distributed representation and nonlinear dynamics of its units. Information in RNNs is propagated and processed in time through the states of its hidden units, which make them appropriate tools for sequential information processing. There are two broad types of RNNs: stochastic energy based with symmetric connections, and deterministic with directed connections.
RNNs are known to be Turing complete computational models (Siegelmann and Sontag, 1995) and universal approximators of dynamical systems (Funahashi and Nakamura, 1993). They are especially appealing for problems that require remembering long-range statistical relationships such as speech, natural language processing, video processing, financial data analysis etc. Additionally, RNNs are shown to be very successful generative models for data completion tasks (Salakhutdinov and Hinton, 2012).
Despite their immense potential as universal computers, difficulties in training RNNs arise due to the inherent difficulty of learning long-term dependencies (Hochreiter, 1991 ; Bengio et al., 1994; and see Hochreiter and Schmidhuber, 1997) and convergence issues (Doya, 1992). However, recent advances suggest promising approaches in overcoming these issues, such as utilizing a reservoir of coupled oscillators (Maass et al., 2002; Jaeger, 2001).
Reservoir computing (echo state networks or liquid state machines) alleviates the problem of training in a recurrent network by using a static dynamical reservoir of coupled oscillators, which are operating at the edge of chaos. It is claimed that many of these type of dynamical systems possess high computational power (Bertschinger and Natschlager, 2004; Legenstein and Maass, 2007). In this approach, due to rich dynamics already provided by the reservoir, there is no need to train many recurrent layers and learning takes place only at the output (or read- out stage) layer. This simplification enables usage of recurrent neural networks in complicated tasks that require memory for long-range (both spatially and temporally) statistical relationships.
The essential feature of the network in the reservoir is called echo state property (Jaeger, 2001). In networks with this property, the effect of previous state and previous input dissipates gradually in the network without getting amplified. For specific network architectures with tanh node nonlinearities, this corresponds to weight matrix having spectral radius less than 1. In classical echo state networks the network is generated randomly and sparsely, considering the spectral radius requirements of the weight matrix. Even though spectral radius constraint ensures stability of the network to some extent, it does not say anything about the short- term memory capacity of the network. The knowledge about this capacity is essential for proper design of the reservoir for the given task.
The reservoir is expected to operate at the edge of chaos because the dynamical systems are shown to present high computational power at this mode (Bertschinger and Natschlager, 2004; Legenstein and Maass, 2007). High memory capacity is also shown for reservoirs at the edge of chaos. Lyapunov exponent is a measure edge of chaos operation in a dynamical system, and it can be empirically computed for a reservoir network (Legenstein and Maass, 2007). However, this computation is not trivial or automatic, and needs expert intervention (Lukosevicius and Jaeger, 2009).
It is empirically shown that there is an optimum Lyapunov exponent of the reservoir network, related to the amount of memory needed for the task (Verstraeten et al., 2007). Thus, fine-tuning the connections in the reservoir for learning the optimal connections that lead to optimal Lyapunov exponent is very crucial for achieving good performance with the reservoir. There are many types of learning methods proposed for tuning the reservoir connections (see Lukosevicius and Jaeger, 2009 for a review), however optimization procedure on the weight matrix is prone to get stuck at local optimum due to high curvature in the weight space. It was thought that the problem can be eased by imposing structural constraints on the connectivity instead of random initial weights, but this is shown in effective in improving the performance of the reservoir (Liebald, 2004).
The input in a complex task is generated by multiple different processes, for which the dynamics and spatio-temporal correlations might be very different. One important shortcoming of the classical reservoir computing approach is its inability to deal with multiple spatio-temporal scales simultaneously. Modular reservoirs have been proposed that contain many decoupled sub-reservoirs operating in different scales, however fine tuning the sub-reservoirs according to the task is not trivial.
Cellular automaton is a discrete computational model consisting of a regular grid of cells, each in one of a finite number of states. The state of an individual cell evolves in time according to a fixed rule, depending on the current state and the state of neighbors. The information presented as the initial states of a grid of cells is processed in the state transitions of cellular automaton. Cellular automata governed by some of the rules are proven to be computationally universal, capable of simulating a Turing machine (Cook, 2004).
The rules of cellular automata are classified (Wolfram, 2002) according to their behavior: attractor, oscillating, chaotic, and edge of chaos. Some of the rules in the last class are shown to be Turing complete (rule 110, Conway's game of life). Lyapunov exponent of a cellular automaton can be computed and it is shown to be a good indicator of computational power of the automata (Baetens and De Baets, 2010). A spectrum of Lyapunov exponent values can be achieved using different cellular automata rules. Therefore a dynamical system with specific memory capacity (i.e. Lyapunov exponent value) can be constructed by using a corresponding cellular automaton.
Cellular automata have been previously used for associative memory and classification tasks. Tzionas et al. (1994) proposed a cellular automaton based classification algorithm. Their algorithm clusters 2D data using cellular automata, creating boundaries between different seeds in the 2D lattice. The partitioned 2D space creates geometrical structure resembling a Voronoi diagram. Different data points belonging to the same class fall into the same island in the Voronoi structure, hence are attracted to the same basin. Clustering property of cellular automata is exploited in a family of approaches, using rules that form attractors in lattice space (Chady and Poly, 1997; Ganguly et al., 2003; Ganguly, 2004), The attractor dynamics of cellular automata resembles Hopfield network architectures
(Hopfield, 1982). These approaches have two major problems: low dimensionality and low computational power. The first problem is due to the need for representing data in 2D space and the need for non-trivial algorithms in higher dimensions. The second problem is due to limiting the computational representation of cellular automata activity with attractor dynamics and clustering. The time evolution of the cellular automata activity has very high computational representation, especially for edge of chaos dynamics, but this is not exploited if the presented data are classified according to the converged basin in 2D space. Another approach is cellular neural network (Chua and Young, 1988a and 1988b, Austin et al., 1997) that emulates memory formation in a neural network, which is totally incapable of chaotic behavior essential for high computational power.
Cellular neural network architectures are patented in EP0649099 Bl and EP0797165 Al. Cellular automaton is used for audio compression in patent US6567781 Bl. Patent WO1997012330 Al defines a method for encoding and decoding data using cellular automata. There are patents that implement reservoir computing in software for specific purposes. Patents US 20130060772 Al and US 8301628 B2 suggest using an echo state network for ontology generation, and patent EP 2389669 Al proposes using reservoir computing based neural network for geodatabase information processing.
Objects of the invention
The object of the invention is to provide a method for implementing reservoir computing based recurrent neural network using cellular automata. Cellular automata replace the echo state neural network in classical reservoir computing. Cellular automata rule search is executed for reservoir training, instead of tuning of echo state network connections.
Detailed description of the invention
In our reservoir computing method, data are passed to a cellular automaton instead of an echo state network and the nonlinear dynamics of cellular automaton provide the necessary projection of the input data onto a nonlinear space. In this configuration, instead of non-trivial fine tuning of network connections, a search is performed in the rule space of cellular automaton that has the optimal Lyapunov exponent for the task. Additionally, utilization of 'edge of chaos' automaton rules ensures Turing complete computation in the reservoir, which is very hard to do using classical reservoir computing approaches. Similar to the usage of modular reservoirs, a hybrid automaton (Sipper et al., 1997) or a hierarchical automaton (Sikdar et al., 2001) is used to handle different spatio- temporal scales in the input. A non-deterministic rule is used for the automaton to introduce randomness into the reservoir.
Algorithmic flow of our method is shown in Figure 1. The reservoir computing system receives the input data (101). The encoding stage (102) translates the input into the initial states of a multidimensional cellular automaton. In reservoir computing stage (103), the cellular automaton rules are executed for a fixed period of time (T) to evolve the initial states. The evolution of the cellular automaton is recorded such that, at each time step a snapshot of the states in the cellular automaton is saved into a data structure. Then, in decoding stage (104) the recorded cellular automaton activity is processed to output a cellular automaton representation of the given input. This output (data vector, 303) is a projection of the input onto a nonlinear cellular automata state space. Then the decoded cellular automaton output is used for further processing (105) according to the task (eg. classification, compression, clustering etc.). The system output (106) comes from this stage, and it is communicated to the outside world.
The sub-steps of encoding stage (102) are given in Figure 2. The input data instance is first pre-processed (201), and this can be a sequence of operations (filter, whiten, reduce dimensionality, transformations such as fourier or wavelet) that modifies and transforms the data. At the end of pre-processing the data have an inherent number of dimensions, K. In the mapping stage (202), these dimensions are separately mapped onto the cellular automata cells. K can be much larger than the number of cellular automata dimensions (P), and then in that case there are two alternatives proposed for the mapping algorithm:
1. The mapping algorithm exploits spatial and temporal partitioning. An input dimension can be encoded in a specific spatial region of the cellular automata at a specific epoch (time). In Figure 3, a representative 2D cellular automaton is depicted to illustrate the mapping sub-step. Suppose we have a K dimensional pre-processed data vector X and X is the k dimension of the vector. £™ is the initial states of cells in mth spatial region (partition) of the cellular automata at epoch, e,. This is depicted in Figure 3, as arrow 303. Spatial regions can possibly live in any subspace of the ambient space. Then in mapping stage, a function maps the kth component of the input (can be integer or binary) into the initial states of cells: r V'4 J (305)
Here, / can simply be binary coding for binary cellular automaton, or any complex quantization function. The same specific spatial region, m, can represent another dimension of the input at another time epoch, e+1, as shown 304 in Figure 3. Therefore, the input is translated into a spatial- temporal code in initial states (203) of cellular automaton cells. As the dimensionality of the input data (201) increases, the complexity of the mapping algorithm and the need for dimensionality reducing preprocessing steps (i.e. principle component analysis) also increases.
Each epoch (301 and 302) of the cellular automaton is considered an independent initialization and evolved separately in reservoir computing stage (103), but with the same cellular automaton rule. Figure 4 gives time evolution of a representative 2D cellular automaton, initialized as epoch, e. 401 and 402 are the states of cells at time, t and t+1 of the cellular automaton evolution. g.f (403), is the value at cell, i. The evolution of the cellular automaton is saved at each time instant into a separate data structure for each epoch:
The union of each epoch automaton's evolution gives the data structure output of the cellular automaton for a given input:
In this configuration, the feature space of the pre-processed input is partitioned into subspaces and each subspace is processed separately. This separation limits the feature interactions in the cellular automaton processing, but decorrelation and whitening pre-processing steps will reduce the detrimental effect of this lack of interactions.
In a more holistic approach, each cell of a real-valued cellular automaton receives weighted input from every (or subset) feature dimension of the pre-processed input (Figure 5). A representative real-valued 2D cellular automaton is shown (501). Each cell (504) receives a weighted (503) sum of initial excitation from the pre-processed input vector (502). Initial value of a cell i (504) is given by:
Alternatively, instead of receiving input from the whole set of feature dimensions, a single cell can receive input from a subset of feature dimensions. In that case, the weight vector for a cell is sparse and a subspace of the input is processed by specific cells. The weights (503) can be set randomly as in echo state networks.
The evolution of cellular automaton is appended at each time instant to get the data structure output of the cellular automaton for a given input:
In decoding stage (Figure 6, 104), the multidimensional data structure c (404, 405) is first post-processed (601). This can be filtering (eg. low-pass, high-pass, denoise), normalization, whitening operations or a combination of these. Then, dimensionality reduction methods (eg. pooling, linear or kernel PCA, manifold learning) are applied onto the post-processed data structure values (602). This step can be skipped altogether if the subsequent stage (105) is able to handle the dimensionality in the data. The values of the cells in the data structure or the output of the dimensionality reduction method is vectorized and assigned as the output (603) of the decoding stage, hence the reservoir.
Rule of the cellular automaton determines the computational power of the reservoir. A totally random approach as in the case of classical echo state networks can be adopted such that, a hybrid or a hierarchical cellular automaton with random rules can be used. However, this network will not be optimal neither for the data nor for the task assigned to the reservoir. An optimization method (Figure 7, 700) is proposed for estimating the best cellular automaton rule for a given dataset and task. In the first sub-step, the rule of the automaton is initialized
(701), based on whether it is an elementary, hybrid, hierarchical or any other type of automaton. The performance (i.e. classification accuracy, reconstruction error etc.) of the automaton is evaluated (702) by using the cellular automaton reservoir for nonlinear projection, processing (105) reservoir output (603) and computing an error metric on the overall system output (106). If a stopping criterion (703) for the optimization procedure is not met, the rule is modified (704); otherwise the optimization stops and outputs the final automaton rule (705). The modification sub-step (704) can be a search (any search algorithm) in the rule space of an elementary cellular automaton. Alternatively, evolutionary/genetic algorithms (Ganguly, 2004) can be used to evolve the rules of a hybrid automaton. Lyapunov exponent, Z parameter, G-density, in-degree length (Wuensche, 1999),entropy, mutual information (Ganguly, 2004) complexity metrics of the automaton rules can be computed for guiding the rule search sub-step (704). In Figure 8, sub-steps of 704 are shown. It accepts the current automaton rule (801), and generates candidate rules (802) according to the current rule. This candidate generation can be according to genetic algorithm principles or via a set of rules with pre- computed complexity metrics (Lyapunov exponent, Z parameter etc.). Next, a long list of available complexity metrics are computed (803) and one of the rules is selected (804) based on the complexity of the current rule and of candidate rules. The rule selection (804) can be memoryless, or with memory about the previous rule selections. The memoryless approach selects the rule with various degrees of complexities, i.e., higher or lower complexity than current rule. Selection with memory draws a decision based on previous rule selections as well as the performances of those selected rules.
The proposed system utilizes a more structured reservoir, cellular automaton, while exploiting the simplicity and power of reservoir computing (Lukosevicius and Jaeger, 2009; Maass, 2010) principles. Cellular automata are easier to analyze and have insurances on Turing completeness. For training the reservoir, a search in rule space of automata is a more stable optimization procedure than gradient
descent weight tuning in echo state network. Additionally, they are extremely easy to implement in parallel hardware such as FPGA, GPU or VLSI.
References:
Siegelmann, H., and Sontag, E. (1995). On the computational power of neural nets. J. Comput. Systems Sci., 50, 132- 150. Funahashi, K., and Nakamura, Y. (1993) Approximation of dynamical systems by continuous time recurrent neural networks. Neural Networks, 6, 801-806.
Salakhutdinov, R. and Hinton, G. E. (2012). An efficient learning procedure for deep Boltzmann machines. Neural Computation, 24, 1967-2006.
Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, T.U. Munich.
Bengio, Y., Simard, R, and Frasconi, R (1994). Learning long-term dependencies with gradient descent is difficult. IEEE T. Neural Networks., 5(2).
Hochreiter S., Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735-1780.
Doya, K. (1992). Bifurcations in the learning of recurrent neural networks. Proceedings of IEEE International Symposium on Circuits and Systems, 6, 2777-2780.
Maass, W., Natschlager, T., and Markram, H. (2002). Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Computation, 14(11), 2531-2560.
Jaeger, H. (2001). The echo state approach to analysing and training recurrent neural networks. Technical Report GMD Report 148, German National Research Center for Information Technology. Lukosevicius, M., Jaeger, H. (2009). Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3), 127-149.
Maass, W. (2010). Liquid state machines: motivation, theory, and applications. In Computability and Context: Computation and Logic in the Real World, B. Cooper and A. Sorbi, Eds. Imperial College Press.
Nils Bertschinger and Thomas Natschlager (2004). Real-time computation at the edge of chaos in recurrent neural networks. Neural Computation, 16(7).
Robert A. Legenstein and Wolfgang Maass (2007). Edge of chaos and prediction of computational performance for neural circuit models. Neural Networks, 20(3):, 323-334.
Chrisantha Fernando and Sampsa Sojakka (2003). Pattern recognition in a bucket. In Proceedings of the 7th European Conference on Advances in Artificial Life (ECAL 2003), volume 2801 of LNCS, pages 588-597.
Adamatzky A. (2001). Computing in nonlinear media: make waves, study collisions. Lecture Notes in Artificial Intelligence. 2159, 1-11.
Adamatzky A. (2002). Experimental logical gates in a reaction-diffusion medium: The XOR gate and beyond. Physical Review E. 66, 046112.
Walmsley, I. (2001). Computing with interference: All-optical single-query 50- element database search. Conference on Lasers and Electro-Optics/Quantum Electronics and Laser Science.
Duport, F., Schneider, B., Smerieri, A., Haelterman, M., Massar, S. (2012). All optical reservoir computing. Opt. Express, 20, 22783.
Adamatzky (2004). Computing with Waves in Chemical Media: Massively Parallel Reaction-Diffusion Processors
Hinton, G.E., Osindero, S., Teh, Y (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527-1554.
Benjamin Liebald (2004). Exploration of e_ects of di_erent network topologies on the ESN signal crosscorrelation matrix spectrum. Bachelor's thesis, Jacobs University Bremen.
Yanbo Xue, Le Yang, and Simon Haykin (2007). Decoupled echo state networks with lateral inhibition. Neural Networks, 20(3), 365-376.
Wolfgang Maass, Robert A. Legenstein, and Nils Bertschinger (2004). Methods for estimating the computational power and generalization capability of neural microcircuits. In Advances in Neural Information Processing Systems.
David Verstraeten, Benjamin Schrauwen, Michiel D'Haene, and Dirk Stroobandt (2007). An experimental unification of reservoir computing methods. Neural Networks, 20(3), 391-403.
Cook, Matthew (2004). Universality in Elementary Cellular Automata. Complex Systems, 15, 1-40.
Zenil, Hector (2010). Compression-based investigation of the dynamical properties of cellular automata and other systems. Complex Systems, 19(1).
Wolfram, Stephen (2002). A New Kind of Science. Wolfram Media.
M. Chady and R. Poli (1997). Evolution of Cellular-Automaton-based Associative Memories. Technical Report no. CSRP-97-15.
P. Tzionas, P. Tsalides, and A. Thanailakis (1993). A New Cellular Automaton- based NearestNeighbor Pattern Classifier and its VLSI Implementation. IEEE Trans, on VLSI Implementation, 2(3), 343-353.
J. J. Hopfield (1982). Neural Networks and Physical System with Emergent Collective Computational Abilities. Proc. of National Academic of Sciences, 79, 2554-2558.
N. Ganguly, B.K. Sikdar, A. Deutsch, G. Canright, and P.P. Chaudhuri (2003). A survey on cellular automata. Technical Report 9, Centre for High Performance Computing, Dresden University of Technology.
Niloy Ganguly (2004). Cellular Automata Evolution :Theory and Applications in Pattern Recognition and Classification. Ph.D thesis. CST Dept. BECDU India
L. O. Chua and L. Yang (1988). Cellular Neural Networks : Application. IEEE Trans. On Circuits and Systems, 35(10),1273-1290.
L. O. Chua and L. Yang (1988). Cellular Neural Networks : Theory. IEEE Trans, on Circuits and Systems, 35(10), 1257-1272. J. Austin, J. Kennedy, S. Buckle, A. Moulds, and R. Pack (1997). The Cellular Neural NetworkAssociative Processor, C-NNAP. IEEE Monograph on Associative Computers.
J. M. Baetens and B. De Baets (2010). Phenomenological Study of Irregular Cellular Automata Based on Lyapunov Exponents and Jacobians, Chaos, 20, 033112.
J. M. Baetens and B. De Baets (2011). On the Topological Sensitivity of Cellular Automata. Chaos.
F. Bagnoli, R. Rechtman, and S. Ruffo (1992). Damage Spreading and Lyapunov Exponents in Cellular Automata. Physics Letters A, 172, 34-38.
Sipper, M., Tomassini, M., Capcarrere, M.S. (1997). Evolving asynchronous and scalable non-uniform cellular automata. In Smith, G.D., Steele, N.C., Albrecht, R.F., eds.: Proceedings of International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA97),
B. K. Sikdar, P. Majumder, M. Mukherjee, N. Ganguly, D. K. Das, and P. Pal Chaudhuri (2001). Hierarchical Cellular Automata as An On-Chip Test Pattern Generator. In Proc. Intl. Conf. on VLSI Design, India, 403-408.
A. Wuensche (1999). Classifying Cellular Automata Automatically. Complexity, 4(3), 47-66.
De Garis, H. , Hemmi, H. (1998). European Patent No. EP0649099 Bl.
Manganaro, G., Lavorgna, M, Lo, P.M., Fortuna, L. (1997). European Patent No. EP0797165 A1.
Olurinde E.L. (2003). US Patent No. US6567781 Bl.
Olurinde E.L. (1997). WIPO Patent No. WO1997012330 Al.
Clark, D., Pieslak, B., Gipson, B. Walton, Z (2013). US Patent No. US 20130060772 Al. Clark, D., Gipson, B. , Pieslak, B., Walton, Z (2012). US Patent No. US 8301628 B2.
Bellens, R., Gautama, S.(2011). European Patent No. EP 2389669 Al