An MRI-based articulatory and acoustic study of American English liquid sounds /r/ and /l/

dc.contributor.advisorEspy-Wilson, Carol Yen_US
dc.contributor.authorZhou, Xinhuien_US
dc.contributor.departmentElectrical Engineeringen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.description.abstractIn American English, the liquid sounds /r/ and /l/ are the most articulatorily variable and complex sounds. They can be produced by several distinct types of tongue configurations and are the most troublesome sounds for children and nonnative English-speakers to learn. Better understanding of this many-to-one mapping between articulation and acoustics would be beneficial to other areas such as speech pathology, speaker verification, speech recognition and speech synthesis. In this dissertation, two articulatory configurations for each liquid sound were studied (a "retroflex" /r/ vs. a "bunched" /r/, and a light /l/ vs. a dark /l/). Different from previous work on liquids, finite element analysis has been performed to obtain the acoustic responses of the three-dimensional (3-D) vocal tract models, which are based on volumetric magnetic resonance (MR) imaging. Area function models were derived based on the wave propagation property inside the vocal tract. The retroflex /r/ and the bunched /r/ show similar patterns of F1-F3 but very different spacing between F4 and F5. The results from the formant acoustic sensitivity functions and simple-tube vocal tract models suggested that this F4/F5 difference can be explained largely by differences in whether the long cavity behind the palatal constriction acts as a half- or a quarter-wavelength resonator. For both the retroflex /r/ and the bunched /r/, F4 and F5 (along with F3 for the particular speakers studied in this research) come from the long back cavity. However, these formants are half wavelength resonances for the retroflex /r/, but quarter wavelength resonances for the bunched /r/. While both the dark /l/ and the light /l/ have a linguo-alveolar contact and two lateral channels, they differ in the length of the linguo-alveolar contact and in the presence of the linguopalatal contacts caused by raising the sides of the tongue. Both have similar patterns in F1-F3, but differ in the number and locations of zeros in spectrum. For the dark /l/, only one zero occurs below 6 kHz and it is produced by the cross mode posterior to the linguo-alveolar contact. For the light /l/, three zeros below 6 kHz are produced by the asymmetrical channels, the supralingual cavity and the cross mode posterior to the linguo-alveolar contact. The results from two simple vocal tract models show that the lateral channels have to be asymmetrical with an effective length between 3-6 cm to get a zero in the region of F3-F5. Based on the Buckeye database, the acoustic variability and discriminative power of liquids were studied with the mel-frequency band energy coefficients as acoustic parameter. Analysis of variance shows that the inter-speaker variability of /r/ is larger than any other phonemes except /sh/, /s/ and /zh/. On average, /r/ and /l/ have larger inter-speaker variability than any other broad phonetic class. The F-ratio averages of liquids are larger than glides, fricatives, affricates and stops, but smaller than nasals. The speaker identification experiments show that the ranking of the average discriminative power for liquids and other broad phonetic classes is: /r/ > Glides > /l/ > Affricates > Fricatives > Stops > Nasals > Vowels.en_US
dc.format.extent4570360 bytes
dc.subject.pqcontrolledEngineering, Electronics and Electricalen_US
dc.subject.pqcontrolledEngineering, Mechanicalen_US
dc.titleAn MRI-based articulatory and acoustic study of American English liquid sounds /r/ and /l/en_US


Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
4.36 MB
Adobe Portable Document Format