Synergy of Acoustic-Phonetics and Auditory Modeling Towards Robust Speech Recognition

dc.contributor.advisorEspy-Wilson, Carol Yen_US
dc.contributor.authorDeshmukh, Om Dadajien_US
dc.contributor.departmentElectrical Engineeringen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2006-09-12T06:09:26Z
dc.date.available2006-09-12T06:09:26Z
dc.date.issued2006-08-31en_US
dc.description.abstractThe problem addressed in this work is that of enhancing speech signals corrupted by additive noise and improving the performance of automatic speech recognizers in noisy conditions. The enhanced speech signals can also improve the intelligibility of speech in noisy conditions for human listeners with hearing impairment as well as for normal listeners. The original Phase Opponency (PO) model, proposed to detect tones in noise, simulates the processing of the information in neural discharge times and exploits the frequency-dependent phase properties of the tuned filters in the auditory periphery along with the cross-auditory-nerve-fiber coincidence detection to extract temporal cues. The Modified Phase Opponency (MPO) proposed here alters the components of the PO model in such a way that the basic functionality of the PO model is maintained but the various properties of the model can be analyzed and modified independently of each other. This work presents a detailed mathematical formulation of the MPO model and the relation between the properties of the narrowband signal that needs to be detected and the properties of the MPO model. The MPO speech enhancement scheme is based on the premise that speech signals are composed of a combination of narrow band signals (i.e. harmonics) with varying amplitudes. The MPO enhancement scheme outperforms many of the other speech enhancement techniques when evaluated using different objective quality measures. Automatic speech recognition experiments show that replacing noisy speech signals by the corresponding MPO-enhanced speech signals leads to an improvement in the recognition accuracies at low SNRs. The amount of improvement varies with the type of the corrupting noise. Perceptual experiments indicate that: (a) there is little perceptual difference in the MPO-processed clean speech signals and the corresponding original clean signals and (b) the MPO-enhanced speech signals are preferred over the output of the other enhancement methods when the speech signals are corrupted by subway noise but the outputs of the other enhancement schemes are preferred when the speech signals are corrupted by car noise.en_US
dc.format.extent5957658 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/3952
dc.language.isoen_US
dc.subject.pqcontrolledEngineering, Electronics and Electricalen_US
dc.titleSynergy of Acoustic-Phonetics and Auditory Modeling Towards Robust Speech Recognitionen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
umi-umd-3832.pdf
Size:
5.68 MB
Format:
Adobe Portable Document Format