Robust Speech Recognition by Topology Preserving Adaptation
Abstract
The performance degradation as a result of acoustical environment mismatch remains an important practical problem in speech recognition.The problem carries a greater significance in applications overtelecommunication channels, especially with the wider use ofpersonal communications systems such as cellular phoneswhich invariably present challenging acoustical conditions. Such conditions are difficult to model analytically for a generalspeech representation, and most existing data-driven models require simultaneous ("stereo") recordings of training and testing environments,impractical to collect in most cases of interest.<p>In this dissertation, we propose an invariance principle fornon-parametric speech representations in acoustical environments.We stipulate that the topology of the codevectors in a vector quantization (VQ) codebookas defined in terms of class posterior distributionswill be preserved in a certain information-theoretic sense,and make this invariance principle our basis in deriving normalizationalgorithms that correct for the acoustical mismatch betweenenvironments.<p>We develop topology preserving algorithms in two frameworks, constrained distortionminimization (VQ with a topology preservation constraint) andinformation geometry (alternating minimization with a topology preservation constraint) and show their equivalence.Finally, we report results on the <I>Wall Street Journal</I> data,the Spoken Speed Dial corpus and the TI Cellular Corpus.<p>The algorithm is shown to improve performancesignificantly in all three tasks, most notably in the more difficult problemof cellular hands free microphone speech wherethe technique decreases theword error for continuous ten digit recognition from 23.8% to 13.6% and the speaker dependent voice callingsentence error from 16.5% to 10.6%.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Automatic Speech Codec Identification with Applications to Tampering Detection of Speech Recordings
Zhou, Jingting (2011)In this work many versions of CELP codecs are explored, and an observation is made that different codebooks are used to encode noisy part of residual. Taking advantage of noise patterns they generated, an algorithm was ... -
Representation of speech in the primary auditory cortex and its implications for robust speech processing
Mesgarani, Nima (2008-08-05)Speech has evolved as a primary form of communication between humans. This most used means of communication has been the subject of intense study for years, but there is still a lot that we do not know about it. It is an ... -
Discrimination of Speech From Non-Speech Based on Multiscale Spectro-Temporal Modulations
Mesgarani, Nima (2005-05-16)We describe a content-based audio classification algorithm based on novel multiscale spectrotemporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from ...