The Acoustic Features of Speech Phonemes in a Model of Auditory Processing: Vowels and Unvoiced Fricatives.

Thumbnail Image
TR_87-93.pdf(1.63 MB)
No. of downloads: 1051
Publication or External Link
Shamma, S.
The acoustic features of three types of stimuli (a harmonic series, naturally spoken vowels, and unvoiced fricatives) are analyzed based on the response patterns they evoke in a model of auditory processing. The model consists of a peripheral cochlear stage, followed by two central neural networks. At the peripheral stage, the asymmetrical nature of the cochlear filters, combined with the preservation of the fine temporal structure of their outputs, provide for robust and level-tolerant spatiotemporal representation of the speech signals. At the subsequent central stages, the cochlear patterns are processed by two layers of lateral inhibitory networks (LIN) to extract the perceptually important parameters of the stimuli. For the harmonic series, an in-phase and an out-of-phase version (one harmonic inverted) are used to illustrate the role of the spatiotemporal cues in encoding the spectral and temporal features of the stimuli. With the more complex vowel sounds, the primary acoustic features encoded by the LIN outputs are the few largest harmonic components of the stimuli, i.e., those closest to the formant frequencics. The output patterns computed for different (male and female) speakers display moderate variability, especially in the locations of the output peaks. However, the results also suggest that the relative levels of the LIN peaks (or the weight distribution of the patterns) is a more stable and characteristic feature of the different vowel groups. The results for the unvoiced fricatives indicate that the most invariant and distinctive acoustic feature the auditory model extracts, is the location of the high frequency edge of each stimulus spectrum.