Learning a Class of Large Finite State Machines with a Recurrent Neural Network
Giles, C. Lee
Horne, Bill G.
MetadataShow full item record
One of the issues in any learning model is how it scales with problem size. Neural networks have not been immune to scaling issues. We show that a dynamically-driven discrete-time recurrent network (DRNN) can learn rather large grammatical inference problems when the strings of a finite memory machine (FMM) are encoded as temporal sequences. FMMs are a subclass of finite state machines which have a finite memory or a finite order of inputs and outputs. The DRNN that learns the FMM is a neural network that maps directly from the sequential machine implementation of the FMM. It has feedback only from the output and not from any hidden units; an example is the recurrent network of Narendra and Parthasarathy. (FMMs that have zero order in the feedback of outputs are called definite memory machines and are analogous to Time-delay or Finite Impulse Response neural networks.) Due to their topology these DRNNs are as least as powerful as any sequential machine implementation of a FMM and should be capable of representing any FMM. We choose to learn ``particular FMMs.\' Specifically, these FMMs have a large number of states (simulations are for $256$ and $512$ state FMMs) but have minimal order, relatively small depth and little logic when the FMM is implemented as a sequential machine. Simulations for the number of training examples versus generalization performance and FMM extraction size show that the number of training samples necessary for perfect generalization is less than that necessary to completely characterize the FMM to be learned. This is in a sense a best case learning problem since any arbitrarily chosen FMM with a minimal number of states would have much more order and string depth and most likely require more logic in its sequential machine implementation. (Also cross-referenced as UMIACS-TR-94-94)