FAULT DETECTION FRAMEWORK FOR IMBALANCED AND SPARSELY-LABELED DATA SETS USING SELF-ORGANIZING MAPS
Pecht, Michael G.
MetadataShow full item record
While machine learning techniques developed for fault detection usually assume that the classes in the training data are balanced, in real-world applications, this is seldom the case. These techniques also usually require labeled training data, obtaining which is a costly and time-consuming task. In this context, a data-driven framework is developed to detect faults in systems where the condition monitoring data is either imbalanced or consists of mostly unlabeled observations. To mitigate the problem of class imbalance, self-organizing maps (SOMs) are trained in a supervised manner, using the same map size for both classes of data, prior to performing classification. The optimal SOM size for balancing the classes in the data, the size of the neighborhood function, and the learning rate, are determined by performing multiobjective optimization on SOM quality measures such as quantization error and information entropy; and performance measures such as training time and classification error. For training data sets which contain a majority of unlabeled observations, the transductive semi-supervised approach is used to label the neurons of an unsupervised SOM, before performing supervised SOM classification on the test data set. The developed framework is validated using artificial and real-world fault detection data sets.