Synaptic Noise in Dynamically-driven Recurrent Neural Networks:
   Convergence and Generalization

Jim, Kam; Giles, C. Lee; Horne, Bill G.

Synaptic Noise in Dynamically-driven Recurrent Neural Networks: Convergence and Generalization

Files

CS-TR-3322.ps (337.57 KB)

No. of downloads: 463

Date

1998-10-15

Authors

Jim, Kam

Giles, C. Lee

Horne, Bill G.

Abstract

There has been much interest in applying noise to feedforward neural networks in order to observe their effect on network performance. We extend these results by introducing and analyzing various methods of injecting synaptic noise into dynamically-driven recurrent networks during training. By analyzing and comparing the effects of these noise models on the error function, we found that applying a controlled amount of noise during training can improve convergence time and generalization performance. In addition, we analyze the effects of various noise parameters (additive vs. multiplicative, cumulative vs. non-cumulative, per time step vs. per sequence) and predict that best overall performance can be achieved by injecting additive noise at each time step. Noise contributes a second-order gradient term to the error function which can be viewed as an anticipatory agent} to aid convergence. This term appears to find promising regions of weight space in the beginning stages of training when the training error is large and should improve convergence on error surfaces with local minima.Synaptic noise also enhances the error function by favoring internal representations where state nodes are operating in the saturated regions of the sigmoid discriminant function, thus improving generalization to longer sequences. We substantiate these predictions by performing extensive simulations on learning the dual parity grammar from grammatical strings encoded as temporal sequences with a second-order fully recurrent neural network. (Also cross-referenced as UMIACS-TR-94-89)

URI (handle)

http://hdl.handle.net/1903/652

Collections

Technical Reports from UMIACS
Technical Reports of the Computer Science Department

Full item page