Comparison Studies of Several Microphone Robustness Techniques
Files
Publication or External Link
Date
Advisor
Citation
DRUM DOI
Abstract
We study the effectiveness of various microphone robustness techniques from the viewpoint of speech recognition, utilizing the ARPA-sponsored Wall Street Journal (WSJ) data base [1]. Two of the techniques considered are being introduced in this paper: two cepstral normalization algorithms utilizing the artificial neural network techniques Self Organizing Map (SOM) and Learning Vector Quantization (LVQ). The algorithms obtained are low- complexity non-parametric counterparts of the parametric approaches Codeword-dependent Cepstral Normalization (CDCN) and Fixed CDCN (FCDCN). The other techniques considered are Cepstral Mean Normalization (CMN), RASTA, SNR-dependent Cepstral Normalization (SDCN), Interpolated SDCN (ISDCN), CDCN, FCDCN; some of these techniques require one or more of the following information: stereo data, SNR estimate, single microphone data for adaptation, and knowledge of the microphone used for the specific data under test. We determine the effectiveness in several ways: (i) scattergram plot of the speech frame parameter vector (usually a cepstral vector), (ii) adjusted deviation ratio, measured from scattergram, and (iii) correctness of classifying a test vector into a vector code book. All these measures have direct correlation with speech recognition performance, which will be measured with experiments to be conducted.