Institute for Systems Research
Permanent URI for this communityhttp://hdl.handle.net/1903/4375
Browse
3 results
Search Results
Item Comparison Studies of Several Microphone Robustness Techniques(1994) Sonmez, M.K.; Kao, Yu-Hung; Rajasekaran, P.K.; Baras, John S.; ISRWe study the effectiveness of various microphone robustness techniques from the viewpoint of speech recognition, utilizing the ARPA-sponsored Wall Street Journal (WSJ) data base [1]. Two of the techniques considered are being introduced in this paper: two cepstral normalization algorithms utilizing the artificial neural network techniques Self Organizing Map (SOM) and Learning Vector Quantization (LVQ). The algorithms obtained are low- complexity non-parametric counterparts of the parametric approaches Codeword-dependent Cepstral Normalization (CDCN) and Fixed CDCN (FCDCN). The other techniques considered are Cepstral Mean Normalization (CMN), RASTA, SNR-dependent Cepstral Normalization (SDCN), Interpolated SDCN (ISDCN), CDCN, FCDCN; some of these techniques require one or more of the following information: stereo data, SNR estimate, single microphone data for adaptation, and knowledge of the microphone used for the specific data under test. We determine the effectiveness in several ways: (i) scattergram plot of the speech frame parameter vector (usually a cepstral vector), (ii) adjusted deviation ratio, measured from scattergram, and (iii) correctness of classifying a test vector into a vector code book. All these measures have direct correlation with speech recognition performance, which will be measured with experiments to be conducted.Item Robustness Study of Free-Text Speaker Identification and Verification(1993) Kao, Yu-Hung; Baras, J.S.; ISRUsable free-text speaker identification and verification systems must exhibit robustness under varying operational conditions. We studied the degree of robustness provided by various signal processing techniques - spectrum subtraction, bandpass liftering, RASTA filtering, ISDCN, and stereo database normalization. The experiments were performed on a widely used, challenging long distance telephone database. This database consists of data recorded at two different sites, with data from one site much poorer in quality than the other; further, the recording equipment had been inadvertently changed for the later half of the sessions resulting in a significantly changed environment. Our study identifies the combination of techniques that provides consistent and significant improvements; our results surpass other published results on the same task. We further verified the results on two other databases and achieved consistent improvements. Detailed results on exhaustive experimentation are presented along with appropriate discussions.Item Low Complexity CELP Speech Coding at 4.8 kbps(1992) Kao, Yu-Hung; Baras, J.S.; ISRLow bit rate, high quality speech coding is a vital part in voice telecommunication systems. The introduction of CELP (1982) (Codebook Excited Linear Prediction) speech coding provides a feasible way to compress speech data to 4.8 kbps with high quality, but the formidable computational complexity required for real-time processing has prevented its wide application. In this thesis, we reduce the computational complexity to 5 MIPS (million instructions per second), which can be handled by even inexpensive DSP chips, while maintaining the same high quality. We hope our contribution can finally make CELP coding a widely applicable technology.