Product Unit Learning

Loading...
Thumbnail Image

Files

CS-TR-3503.ps (239.36 KB)
No. of downloads: 223
CS-TR-3503.pdf (232.72 KB)
No. of downloads: 592

Publication or External Link

Date

1998-10-15

Advisor

Citation

DRUM DOI

Abstract

Product units provide a method of automatically learning the higher-order input combinations required for the efficient synthesis of Boolean logic functions by neural networks. Product units also have a higher information capacity than sigmoidal networks. However, this activation function has not received much attention in the literature. A possible reason for this is that one encounters some problems when using standard backpropagation to train networks containing these units. This report examines these problems, and evaluates the performance of three training algorithms on networks of this type. Empirical results indicate that the error surface of networks containing product units have more local minima than corresponding networks with summation units. For this reason, a combination of local and global training algorithms were found to provide the most reliable convergence.

We then investigate how `hints' can be added to the training algorithm. By extracting a common frequency from the input weights, and training this frequency separately, we show that convergence can be accelerated.

A constructive algorithm is then introduced which adds product units to a network as required by the problem. Simulations show that for the same problems this method creates a network with significantly less neurons than those constructed by the tiling and upstart algorithms.

In order to compare their performance with other transfer functions, product units were implemented as candidate units in the Cascade Correlation (CC) \cite{Fahlman90} system. Using these candidate units resulted in smaller networks which trained faster than when the any of the standard (three sigmoidal types and one Gaussian) transfer functions were used. This superiority was confirmed when a pool of candidate units of four different nonlinear activation functions were used, which have to compete for addition to the network. Extensive simulations showed that for the problem of implementing random Boolean logic functions, product units are always chosen above any of the other transfer functions. (Also cross-referenced as UMIACS-TR-95-80)

Notes

Rights