Robust Speech Recognition by Topology Preserving Adaptation

Sonmez, Kemal S.

Robust Speech Recognition by Topology Preserving Adaptation

Files

PhD_2000-4.pdf (5.12 MB)

No. of downloads: 509

Date

2000

Authors

Sonmez, Kemal S.

Advisor

Baras, John S.

Abstract

The performance degradation as a result of acoustical environment mismatch remains an important practical problem in speech recognition.The problem carries a greater significance in applications overtelecommunication channels, especially with the wider use ofpersonal communications systems such as cellular phoneswhich invariably present challenging acoustical conditions. Such conditions are difficult to model analytically for a generalspeech representation, and most existing data-driven models require simultaneous ("stereo") recordings of training and testing environments,impractical to collect in most cases of interest.

In this dissertation, we propose an invariance principle fornon-parametric speech representations in acoustical environments.We stipulate that the topology of the codevectors in a vector quantization (VQ) codebookas defined in terms of class posterior distributionswill be preserved in a certain information-theoretic sense,and make this invariance principle our basis in deriving normalizationalgorithms that correct for the acoustical mismatch betweenenvironments.

We develop topology preserving algorithms in two frameworks, constrained distortionminimization (VQ with a topology preservation constraint) andinformation geometry (alternating minimization with a topology preservation constraint) and show their equivalence.Finally, we report results on the Wall Street Journal data,the Spoken Speed Dial corpus and the TI Cellular Corpus.

The algorithm is shown to improve performancesignificantly in all three tasks, most notably in the more difficult problemof cellular hands free microphone speech wherethe technique decreases theword error for continuous ten digit recognition from 23.8% to 13.6% and the speaker dependent voice callingsentence error from 16.5% to 10.6%.

URI (handle)

http://hdl.handle.net/1903/6147

Collections

Institute for Systems Research Technical Reports

Full item page