What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation

Lawrence, Steve; Giles, C. Lee; Tsoi, Ah Chung

What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation

Files

CS-TR-3617.ps (2.12 MB)

No. of downloads: 938

CS-TR-3617.pdf (575.02 KB)

No. of downloads: 4133

Date

1998-10-15

Authors

Lawrence, Steve

Giles, C. Lee

Tsoi, Ah Chung

Abstract

One of the most important aspects of any machine learning paradigm is how it scales according to problem size and complexity. Using a task with known optimal training error, and a pre-specified maximum number of training updates, we investigate the convergence of the backpropagation algorithm with respect to a) the complexity of the required function approximation, b) the size of the network in relation to the size required for an optimal solution, and c) the degree of noise in the training data. In general, for a) the solution found is worse when the function to be approximated is more complex, for b) oversize networks can result in lower training and generalization error, and for c) the use of committee or ensemble techniques can be more beneficial as the amount of noise in the training data is increased. For the experiments we performed, we do not obtain the optimal solution in any case. We further support the observation that larger networks can produce better training and generalization error using a face recognition example where a network with many more parameters than training points generalizes better than smaller networks. (Also cross-referenced as UMIACS-TR-96-22)

URI (handle)

http://hdl.handle.net/1903/809

Collections

Technical Reports from UMIACS
Technical Reports of the Computer Science Department

Full item page