Discrimination of Speech From Non-Speech Based on Multiscale Spectro-Temporal Modulations

Mesgarani, Nima

Discrimination of Speech From Non-Speech Based on Multiscale Spectro-Temporal Modulations

Files

umi-umd-2483.pdf (589.25 KB)

No. of downloads: 1609

mainthesis.pdf (589.25 KB)

No. of downloads: 1851

Date

2005-05-16

Authors

Mesgarani, Nima

Advisor

Shamma, Shihab

Abstract

We describe a content-based audio classification algorithm based on novel multiscale spectrotemporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from non-speech consisting of animal vocalizations, music and environmental sounds. Although this is a relatively easy task for humans, it is still difficult to automate well, especially in noisy and reverberant environments. The auditory model captures basic processes occurring from the early cochlear stages to the central cortical areas. The model generates a multidimensional spectro-temporal representation of the sound, which is then analyzed by a multi-linear dimensionality reduction technique and classified by a Support Vector Machine (SVM). Generalization of the system to signals in high level of additive noise and reverberation is evaluated and compared to two existing approaches [1] [2]. The results demonstrate the advantages of the auditory model over the other two systems, especially at low SNRs and high reverberation.

URI (handle)

http://hdl.handle.net/1903/3044

Collections

UMD Theses and Dissertations
Electrical & Computer Engineering Theses and Dissertations

Full item page