Browsing by Author "Giles, C. Lee"
Now showing 1 - 20 of 24
Results Per Page
Sort Options
Item Alternative Discrete-Time Operators and Their Application to Nonlinear Models(1998-10-15) Back, Andrew D.; Tsoi, Ah Chung; Horne, Bill G.; Giles, C. LeeThe shift operator, defined as q x(t) = x(t+1), is the basis for almost all discrete-time models. It has been shown however, that linear models based on the shift operator suffer problems when used to model lightly-damped-low-frequency (LDLF) systems, with poles near $(1,0)$ on the unit circle in the complex plane. This problem occurs under fast sampling conditions. As the sampling rate increases, coefficient sensitivity and round-off noise become a problem as the difference between successive sampled inputs becomes smaller and smaller. The resulting coefficients of the model approach the coefficients obtained in a binomial expansion, regardless of the underlying continuous-time system. This implies that for a given finite wordlength, severe inaccuracies may result. Wordlengths for the coefficients may also need to be made longer to accommodate models which have low frequency characteristics, corresponding to poles in the neighbourhood of (1,0). These problems also arise in neural network models which comprise of linear parts and nonlinear neural activation functions. Various alternative discrete-time operators can be introduced which offer numerical computational advantages over the conventional shift operator. The alternative discrete-time operators have been proposed independently of each other in the fields of digital filtering, adaptive control and neural networks. These include the delta, rho, gamma and bilinear operators. In this paper we first review these operators and examine some of their properties. An analysis of the TDNN and FIR MLP network structures is given which shows their susceptibility to parameter sensitivity problems. Subsequently, it is shown that models may be formulated using alternative discrete-time operators which have low sensitivity properties. Consideration is given to the problem of finding parameters for stable alternative discrete-time operators. A learning algorithm which adapts the alternative discrete-time operators parameters on-line is presented for MLP neural network models based on alternative discrete-time operators. It is shown that neural network models which use these alternative discrete-time perform better than those using the shift operator alone. (Also cross-referenced as UMIACS-TR-97-03)Item Computational Capabilities of Recurrent NARX Neural Networks(1998-10-15) Siegelmann, Hava T.; Horne, Bill G.; Giles, C. LeeRecently, fully connected recurrent neural networks have been proven to be computationally rich --- at least as powerful as Turing machines. This work focuses on another network which is popular in control applications and has been found to be very effective at learning a variety of problems. These networks are based upon Nonlinear AutoRegressive models with eXogenous Inputs (NARX models), and are therefore called {\em NARX networks}. As opposed to other recurrent networks, NARX networks have a limited feedback which comes only from the output neuron rather than from hidden states. They are formalized by \[ y(t) = \Psi \left( \rule[-1ex]{0em}{3ex} u(t-n_u), \ldots, u(t-1), u(t), y(t-n_y), \ldots, y(t-1) \right), \] where $u(t)$ and $y(t)$ represent input and output of the network at time $t$, $n_u$ and $n_y$ are the input and output order, and the function $\Psi$ is the mapping performed by a Multilayer Perceptron. We constructively prove that the NARX networks with a finite number of parameters are computationally as strong as fully connected recurrent networks and thus Turing machines. We conclude that in theory one can use the NARX models, rather than conventional recurrent networks without any computational loss even though their feedback is limited. Furthermore, these results raise the issue of what amount of feedback or recurrence is necessary for any network to be Turing equivalent and what restrictions on feedback limit computational power. (Also cross-referenced as UMIACS-TR-95-12)Item Constructing Deterministic Finite-State Automata in Recurrent Neural Networks(1998-10-15) Omlin, Christian W.; Giles, C. LeeRecurrent neural networks that are {\it trained} to behave like deterministic finite-state automata (DFAs) can show deteriorating performance when tested on long strings. This deteriorating performance can be attributed to the instability of the internal representation of the learned DFA states. The use of a sigmoidal discriminant function together with the recurrent structure contribute to this instability. We prove that a simple algorithm can {\it construct} second-order recurrent neural networks with a sparse interconnection topology and sigmoidal discriminant function such that the internal DFA state representations are stable, i.e. the constructed network correctly classifies strings of {\it arbitrary length}. The algorithm is based on encoding strengths of weights directly into the neural network. We derive a relationship between the weight strength and the number of DFA states for robust string classification. For a DFA with $n$ states and $m$ input alphabet symbols, the constructive algorithm generates a ``programmed" neural network with $O(n)$ neurons and $O(mn)$ weights. We compare our algorithm to other methods proposed in the literature. Revised in February 1996 (Also cross-referenced as UMIACS-TR-95-50)Item A Delay Damage Model Selection Algorithm for NARX Neural Networks(1998-10-15) Lin, Tsungnan; Giles, C. Lee; Horne, Bill G.; Kung, Sun-YangRecurrent neural networks have become popular models for system identification and time series prediction. NARX (Nonlinear AutoRegressive models with eXogenous inputs) neural network models are a popular subclass of recurrent networks and have beenused in many applications. Though embedded memory can be found in all recurrent network models, it is particularly prominent in NARX models. We show that using intelligent memory order selection through pruning and good initial heuristics significantly improves the generalization and predictive performance of these nonlinear systems on problems as diverse as grammatical inference and time series prediction. (Also cross-referenced as UMIACS-TR-96-77)Item Extraction of Rules from Discrete-Time Recurrent Neural Networks(1998-10-15) Omlin, Christian W.; Giles, C. LeeThe extraction of symbolic knowledge from trained neural networks and the direct encoding of (partial) knowledge into networks prior to training are important issues. They allow the exchange of information between symbolic and connectionist knowledge representations. The focus of this paper is on the quality of the rules that are extracted from recurrent neural networks. Discrete-time recurrent neural networks can be trained to correctly classify strings of a regular language. Rules defining the learned grammar can be extracted from networks in the form of deterministic finite-state automata (DFA's) by applying clustering algorithms in the output space of recurrent state neurons. Our algorithm can extract different finite-state automata that are consistent with a training set from the same network. We compare the generalization performances of these different models and the trained network and we introduce a heuristic that permits us to choose among the consistent DFA's the model which best approximates the learned regular grammar. (Also cross-referenced as UMIACS-TR-95-54)Item Face Recognition: A Hybrid Neural Network Approach(1998-10-15) Lawrence, Steve; Giles, C. Lee; Tsoi, Ah Chung; Back, Andrew D.Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult. We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map neural network, and a convolutional neural network. The self-organizing map provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. The convolutional network extracts successively larger features in a hierarchical set of layers. We present results using the Karhunen-Loeve transform in place of the self-organizing map, and a multi-layer perceptron in place of the convolutional network. The Karhunen-Loeve transform performs almost as well (5.3% error versus 3.8%). The multi-layer perceptron performs very poorly (40% error versus 3.8%). The method is capable of rapid classification, requires only fast, approximate normalization and preprocessing, and consistently exhibits better classification performance than the eigenfaces approach on the database considered as the number of images per person in the training database is varied from 1 to 5. With 5 images per person the proposed method and eigenfaces result in 3.8 and 10.5 error respectively. The recognizer provides a measure of confidence in its output and classification error approaches zero when rejecting as few as 10 of the examples. We use a database of 400 images of 40 individuals which contains quite a high degree of variability in expression, pose, and facial details. We analyze computational complexity and discuss how new classes could be added to the trained recognizer. (Also cross-referenced as UMIACS-TR-96-16)Item Finite State Machines and Recurrent Neural Networks -- Automata and Dynamical Systems Approaches(1998-10-15) Tino, Peter; Horne, Bill G.; Giles, C. LeeWe present two approaches to the analysis of the relationship between a recurrent neural network (RNN) and the finite state machine \( {\cal M} \) the network is able to exactly mimic. First, the network is treated as a state machine and the relationship between the RNN and \( {\cal M} \) is established in the context of algebraic theory of automata. In the second approach, the RNN is viewed as a set of discrete-time dynamical systems associated with input symbols of \( {\cal M} \). In particular, issues concerning network representation of loops and cycles in the state transition diagram of \( {\cal M} \) are shown to provide a basis for the interpretation of learning process from the point of view of bifurcation analysis. The circumstances under which a loop corresponding to an input symbol \( x \) is represented by an attractive fixed point of the underlying dynamical system associated with \( x \) are investigated. For the case of two recurrent neurons, under some assumptions on weight values, bifurcations can be understood in the geometrical context of intersection of increasing and decreasing parts of curves defining fixed points. The most typical bifurcation responsible for the creation of a new fixed point is the saddle node bifurcation. (Also cross-referenced as UMIACS-TR-95-1)Item Fixed Points in Two--Neuron Discrete Time Recurrent Networks: Stability and Bifurcation Considerations(1998-10-15) Tino, Peter; Horne, Bill G.; Giles, C. LeeThe position, number and stability types of fixed points of a two--neuron recurrent network with nonzero weights are investigated. Using simple geometrical arguments in the space of derivatives of the sigmoid transfer function with respect to the weighted sum of neuron inputs, we partition the network state space into several regions corresponding to stability types of the fixed points. If the neurons have the same mutual interaction pattern, i.e. they either mutually inhibit or mutually excite themselves, a lower bound on the rate of convergence of the attractive fixed points towards the saturation values, as the absolute values of weights on the self--loops grow, is given. The role of weights in location of fixed points is explored through an intuitively appealing characterization of neurons according to their inhibition/excitation performance in the network. In particular, each neuron can be of one of the four types: greedy, enthusiastic, altruistic or depressed. Both with and without the external inhibition/excitation sources, we investigate the position and number of fixed points according to character of the neurons. When both neurons self-excite (or self-inhibit) themselves and have the same mutual interaction pattern, the mechanism of creation of a new attractive fixed point is shown to be that of saddle node bifurcation. (Also cross-referenced as UMIACS-TR-95-51)Item Flexible User Profiles for Large Scale Data Delivery(1999-03-30) Cetintemel, Ugur; Franklin, Michael J.; Giles, C. LeePush-based data delivery requires knowledge of user interests for making scheduling, bandwidth allocation, and routing decisions. Such information is maintained as user profiles. We propose a new incremental algorithm for constructing user profiles based on monitoring and user feedback. In contrast to earlier approaches, which typically represent profiles as a single weighted interest vector, we represent user-profiles using multiple interest clusters, whose number, size, and elements change adaptively based on user access behavior. This flexible approach allows the profile to more accurately represent complex user interests. The approach can be tuned to trade off profile complexity and effectiveness, making it suitable for use in large-scale information filtering applications such as push-based WWW page dissemination. We evaluate the method by experimentally investigating its ability to categorize WWW pages taken from Yahoo! categories. Our results show that the method can provide high retrieval effectiveness with modest profile sizes and can effectively adapt to changes in users' interests. Also cross-referenced as UMIACS-TR-99-18Item Fuzzy Finite-state Automata Can Be Deterministically Encoded into Recurrent Neural Networks(1998-10-15) Omlin, Christian W.; Thornber, Karvel K.; Giles, C. LeeThere has been an increased interest in combining fuzzy systems with neural networks because fuzzy neural systems merge the advantages of both paradigms. On the one hand, parameters in fuzzy systems have clear physical meanings and rule-based and linguistic information can be incorporated into adaptive fuzzy systems in a systematic way. On the other hand, there exist powerful algorithms for training various neural network models. However, most of the proposed combined architectures are only able to process static input-output relationships, i.e. they are not able to process temporal input sequences of arbitrary length. Fuzzy finite-state automata (FFAs) can model dynamical processes whose current state depends on the current input and previous states. Unlike in the case of deterministic finite-state automata (DFAs), FFAs are not in one particular state, rather each state is occupied to some degree defined by a membership function. Based on previous work on encoding DFAs in discrete-time, second-order recurrent neural networks, we propose an algorithm that constructs an augmented recurrent neural network that encodes a FFA and recognizes a given fuzzy regular language with arbitrary accuracy. We then empirically verify the encoding methodology by measuring string recognition performance of recurrent neural networks which encode large randomly generated FFAs. In particular, we examine how the networks' performance varies as a function of synaptic weight strength. (Also cross-referenced as UMIACS-TR-96-12)Item How Embedded Memory in Recurrent Neural Network Architectures Helps Learning Long-term Dependencies(1998-10-15) Lin, Tsungnan; Horne, Bill G.; Giles, C. LeeLearning long-term temporal dependencies with recurrent neural networks can be a difficult problem. It has recently been shown that a class of recurrent neural networks called NARX networks perform much better than conventional recurrent neural networks for learning certain simple long-term dependency problems. The intuitive explanation for this behavior is that the output memories of a NARX network can be manifested as jump-ahead connections in the time-unfolded network. These jump-ahead connections can propagate gradient information more efficiently, thus reducing the sensitivity of the network to long-term dependencies. This work gives empirical justification to our hypothesis that similar improvements in learning long-term dependencies can be achieved with other classes of recurrent neural network architectures simply by increasing the order of the embedded memory. In particular we explore the impact of learning simple long-term dependency problems on three classes of recurrent neural networks architectures: globally recurrent networks, locally recurrent networks, and NARX (output feedback) networks. Comparing the performance of these architectures with different orders of embedded memory on two simple long-term dependences problems shows that all of these classes of networks architectures demonstrate significant improvement on learning long-term dependencies when the orders of embedded memory are increased. These results can be important to a user comfortable to a specific recurrent neural network architecture because simply increasing the embedding memory order will make the architecture more robust to the problem of long-term dependency learning. (Also cross-referenced as UMIACS-TR-96-28)Item Learning a Class of Large Finite State Machines with a Recurrent Neural Network(1998-10-15) Giles, C. Lee; Horne, Bill G.; Lin, T.One of the issues in any learning model is how it scales with problem size. Neural networks have not been immune to scaling issues. We show that a dynamically-driven discrete-time recurrent network (DRNN) can learn rather large grammatical inference problems when the strings of a finite memory machine (FMM) are encoded as temporal sequences. FMMs are a subclass of finite state machines which have a finite memory or a finite order of inputs and outputs. The DRNN that learns the FMM is a neural network that maps directly from the sequential machine implementation of the FMM. It has feedback only from the output and not from any hidden units; an example is the recurrent network of Narendra and Parthasarathy. (FMMs that have zero order in the feedback of outputs are called definite memory machines and are analogous to Time-delay or Finite Impulse Response neural networks.) Due to their topology these DRNNs are as least as powerful as any sequential machine implementation of a FMM and should be capable of representing any FMM. We choose to learn ``particular FMMs.\' Specifically, these FMMs have a large number of states (simulations are for $256$ and $512$ state FMMs) but have minimal order, relatively small depth and little logic when the FMM is implemented as a sequential machine. Simulations for the number of training examples versus generalization performance and FMM extraction size show that the number of training samples necessary for perfect generalization is less than that necessary to completely characterize the FMM to be learned. This is in a sense a best case learning problem since any arbitrarily chosen FMM with a minimal number of states would have much more order and string depth and most likely require more logic in its sequential machine implementation. (Also cross-referenced as UMIACS-TR-94-94)Item Learning Long-Term Dependencies is Not as Difficult With NARX Recurrent Neural Networks(1998-10-15) Lin, Tsungnan; Horne, Bill G.; Tino, Peter; Giles, C. LeeIt has recently been shown that gradient descent learning algorithms for recurrent neural networks can perform poorly on tasks that involve long- term dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. In this paper we explore the long-term dependencies problem for a class of architectures called NARX recurrent neural networks, which have power ful representational capabilities. We have previously reported that gradient descent learning is more effective in NARX networks than in recurrent neural network architectures that have ``hidden states'' on problems includ ing grammatical inference and nonlinear system identification. Typically, the network converges much faster and generalizes better than other net works. The results in this paper are an attempt to explain this phenomenon. We present some experimental results which show that NARX networks can often retain information for two to three times as long as conventional recurrent neural networks. We show that although NARX networks do not circumvent the problem of long-term dependencies, they can greatly improve performance on long-term dependency problems. We also describe in detail some of the assumption regarding what it means to latch information robustly and suggest possible ways to loosen these assumptions. (Also cross-referenced as UMIACS-TR-95-78)Item Neural Learning of Chaotic Dynamics: The Error Propagation Algorithm(1998-10-15) Bakker, Rembrandt; Schouten, Jaap C.; Bleek, Cor M. van den; Giles, C. LeeAn algorithm is introduced that trains a neural network to identify chaotic dynamics from a single measured time-series. The algorithm has four special features: 1. The state of the system is extracted from the time-series using delays, followed by weighted Principal Component Analysis (PCA) data reduction. 2. The prediction model consists of both a linear model and a Multi- Layer-Perceptron (MLP). 3. The effective prediction horizon during training is user-adjustable due to error propagation: prediction errors are partially propagated to the next time step. 4. A criterion is monitored during training to select the model that as a chaotic attractor is most similar to the real system attractor. The algorithm is applied to laser data from the Santa Fe time-series competition (set A). The resulting model is not only useful for short-term predictions but it also generates time-series with similar chaotic characteristics as the measured data. _Also cross-referenced as UMIACS-TR-97-77)Item The Neural Network Pushdown Automaton: Model, Stack and Learning Simulations(1998-10-15) Sun, G.Z.; Giles, C. Lee; Chen, H.H.; Lee, Y.C.In order for neural networks to learn complex languages or grammars, they must have sufficient computational power or resources to recognize or generate such languages. Though many approaches have been discussed, one obvious approach to enhancing the processing power of a recurrent neural network is to couple it with an external stack memory - in effect creating a neural network pushdown automata (NNPDA). This paper discusses in detail this NNPDA - its construction, how it can be trained and how useful symbolic information can be extracted from the trained network. In order to couple the external stack to the neural network, an optimization method is developed which uses an error function that connects the learning of the state automaton of the neural network to the learning of the operation of the external stack. To minimize the error function using gradient descent learning, an analog stack is designed such that the action and storage of information in the stack are continuous. One interpretation of a continuous stack is the probabilistic storage of and action on data. After training on sample strings of an unknown source grammar, a quantization procedure extracts from the analog stack and neural network a discrete pushdown automata (PDA). Simulations show that in learning deterministic context-free grammars - the balanced parenthesis language, 1n0n, and the deterministic Palindrome - the extracted PDA is correct in the sense that it can correctly recognize unseen strings of arbitrary length. In addition, the extracted PDAs can be shown to be identical or equivalent to the PDAs of the source grammars which were used to generate the training strings. (Also cross-referenced as UMIACS-TR-93-77.)Item Noisy Time Series Prediction using Symbolic Representation and Recurrent Neural Network Grammatical Inference(1998-10-15) Lawrence, Steve; Tsoi, Ah Chung; Giles, C. LeeFinancial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method uses conversion into a symbolic representation with a self-organizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the direction of change for the next day with an error rate of 47.1%. The error rate reduces to around 40% when rejecting examples where the system has low confidence in its prediction. The symbolic representation aids the extraction of symbolic knowledge from the recurrent neural networks in the form of deterministic finite state automata. These automata explain the operation of the system and are often relatively simple. Rules related to well known behavior such as trend following and mean reversal are extracted. Also cross-referenced as UMIACS-TR-96-27Item On the Applicability of Neural Network and Machine Learning Methodologies to Natural Language Processing(1998-10-15) Lawrence, Steve; Giles, C. Lee; Fong, SandiwayWe examine the inductive inference of a complex grammar - specifically, we consider the task of training a model to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government- and-Binding theory. We investigate the following models: feed-forward neural networks, Fransconi-Gori-Soda and Back-Tsoi locally recurrent networks, Elman, Narendra \& Parthasarathy, and Williams \& Zipser recurrent networks, Euclidean and edit-distance nearest-neighbors, simulated annealing, and decision trees. The feed-forward neural networks and non-neural network machine learning models are included primarily for comparison. We address the question: How can a neural network, with its distributed nature and gradient descent based iterative calculations, possess linguistic capability which is traditionally handled with symbolic computation and recursive processes? Initial simulations with all models were only partially successful by using a large temporal window as input. Models trained in this fashion did not learn the grammar to a significant degree. Attempts at training recurrent networks with small temporal input windows failed until we implemented several techniques aimed at improving the convergence of the gradient descent training algorithms. We discuss the theory and present an empirical study of a variety of models and learning algorithms which highlights behaviour not present when attempting to learn a simpler grammar. (Also cross-referenced as UMIACS-TR-95-64)Item Performance of On-Line Learning Methods in Predicting Multiprocessor Memory Access Patterns(1998-10-15) Sakr, Majd F.; Levitan, Steven P.; Chiarulli, Donald M.; Horne, Bill G.; Giles, C. LeeShared memory multiprocessors require reconfigurable interconnection networks (INs) for scalability. These INs are reconfigured by an IN control unit. However, these INs are often plagued by undesirable reconfiguration time that is primarily due to control latency, the amount of time delay that the control unit takes to decide on a desired new IN configuration. To reduce control latency, a trainable prediction unit (PU) was devised and added to the IN controller. The PU's job is to anticipate and reduce control configuration time, the major component of the control latency. Three different on-line prediction techniques were tested to learn and predict repetitive memory access patterns for three typical parallel processing applications, the 2-D relaxation algorithm, matrix multiply and Fast Fourier Transform. The predictions were then used by a routing control algorithm to reduce control latency by configuring the IN to provide needed memory access paths before they were requested. Three prediction techniques were used and tested: 1). a Markov predictor, 2). a linear predictor and 3). a time delay neural network (TDNN) predictor. As expected, different predictors performed best on different applications, however, the TDNN produced the best overall results. (Also cross-referenced as UMIACS-TR-96-59)Item Product Unit Learning(1998-10-15) Leerink, Laurens R.; Giles, C. Lee; Horne, Bill G.; Jabri, Marwan A.Product units provide a method of automatically learning the higher-order input combinations required for the efficient synthesis of Boolean logic functions by neural networks. Product units also have a higher information capacity than sigmoidal networks. However, this activation function has not received much attention in the literature. A possible reason for this is that one encounters some problems when using standard backpropagation to train networks containing these units. This report examines these problems, and evaluates the performance of three training algorithms on networks of this type. Empirical results indicate that the error surface of networks containing product units have more local minima than corresponding networks with summation units. For this reason, a combination of local and global training algorithms were found to provide the most reliable convergence. We then investigate how `hints' can be added to the training algorithm. By extracting a common frequency from the input weights, and training this frequency separately, we show that convergence can be accelerated. A constructive algorithm is then introduced which adds product units to a network as required by the problem. Simulations show that for the same problems this method creates a network with significantly less neurons than those constructed by the tiling and upstart algorithms. In order to compare their performance with other transfer functions, product units were implemented as candidate units in the Cascade Correlation (CC) \cite{Fahlman90} system. Using these candidate units resulted in smaller networks which trained faster than when the any of the standard (three sigmoidal types and one Gaussian) transfer functions were used. This superiority was confirmed when a pool of candidate units of four different nonlinear activation functions were used, which have to compete for addition to the network. Extensive simulations showed that for the problem of implementing random Boolean logic functions, product units are always chosen above any of the other transfer functions. (Also cross-referenced as UMIACS-TR-95-80)Item Routing in Optical Multistage Interconnection Networks: a Neural Network Solution(1998-10-15) Giles, C. Lee; Goudreau, Mark W.There has been much interest in using optics to implement computer interconnection networks. However, there has been little discussion of any routing methodologies besides those already used in electronics. In this paper, a neural network routing methodology is proposed that can generate control bits for an optical multistage interconnection network (OMIN). Though we present no optical implementation of this methodology, we illustrate its control for an optical interconnection network. These OMINs may be used as communication media for shared memory, distributed computing systems.The routing methodology makes use of an Artificial Neural Network (ANN) that functions as a parallel computer for generating the routes. The neural network routing scheme may be applied to electrical as well as optical interconnection networks.However, since the ANN can be implemented using optics, this routing approach is especially appealing for an optical computing environment. The parallel nature of the ANN computation may make this routing scheme faster than conventional routing approaches, especially for OMINs that are irregular. Furthermore, the neural network routing scheme is fault-tolerant. Results are shown for generating routes in a 16 times 16, 3 stage OMIN. (Also cross-referenced as UMIACS-TR-94-21.)