Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 9 of 9

Adult discrimination of children’s voices over time: Voice discrimination of auditory samples from longitudinal research studies
(2024) Opusunju, Shelby; Bernstein Ratner, Nan; Hearing and Speech Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
The human voice is subject to change over the lifespan, and these changes are even more pronounced in children. Acoustic properties of speech, such as fundamental frequency, amplitude, speech rate, and fluency, change dramatically as children grow and develop (Lee et al., 1999). Previous studies have established that listeners have a generally strong capacity to discriminate between adult speakers, as well as identify the age of a speaker, based solely on the voice (Kreiman and Sidtis, 2011; Park, 2019). However, few studies have been performed on the listener’s capacity to discriminate between the voices of children, particularly as the voice matures over time. This study examines how well adult listeners can discriminate between the voices of young children of the same age and at different ages. Single-word child language samples from different children (N = 6) were obtained from Munson et al. (2021) and used to create closed-set online AX voice discrimination tasks for adult listeners (N= 31). Three tasks examined listeners’ accuracy and sensitivity in identifying whether a voice was that of the same child or a different child under three conditions: 1) between two children that are both three-years old, 2) between two children that are five-years old, and 3) between two children of different ages (three- vs. five-years old). Listeners’ performance showed above-chance levels of accuracy and sensitivity at discriminating between the voices of children at three-years-old and at two children at five-years-old. Listener performance was not significantly different in these two tasks. No listeners demonstrated above-chance levels of accuracy in discriminating between the voices of a single child at two different ages. Listener performance was significantly poorer in this task compared to the previous two. The findings from this experiment demonstrated a sizable difference in adults' ability to recognize child voices at two different ages than at one age. Possible explanations and implications for understanding child talker discrimination across different ages are discussed.
Evaluating the role of acoustic cues in identifying the presence of a code-switch
(2024) Exton, Erika Lynn; Newman, Rochelle S.; Hearing and Speech Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Code-switching (switching between languages) is a common linguistic behavior in bilingual speech directed to infants and children. In adult-directed speech (ADS), acoustic-phonetic properties of one language may transfer to the other language close to a code-switch point; for example, English stop consonants may be more Spanish-like near a switch. This acoustically-natural code-switching may be easier for bilingual listeners to comprehend than code-switching without these acoustic changes; however, it effectively results in the languages being more phonetically similar at the point of a code-switch, which could make them difficult for an unfamiliar listener to distinguish. The goal of this research was to assess the acoustic-phonetic cues to code-switching available to listeners unfamiliar with the languages by studying the perception and production of these cues. In Experiment 1 Spanish-English bilingual adults (particularly those who hear code-switching frequently), but not English monolingual adults, were sensitive to natural acoustic cues to code-switching in unfamiliar languages and could use them to identify language switches between French and Mandarin. Such cues were particularly helpful when they allowed listeners to anticipate an upcoming language switch (Experiment 2). In Experiment 3 monolingual children appeared unable to continually identify which language they were hearing. Experiment 4 provides some preliminary evidence that monolingual infants can identify a switch between French and Mandarin, though without addressing the utility of natural acoustic cues for infants. The acoustic detail of code-switched speech to infants was investigated to evaluate how acoustic properties of bilingual infant-directed speech (IDS) are impacted by the presence of and proximity to code-switching. Spanish-English bilingual women narrated wordless picture books in IDS and ADS, and the voice onset times of their English voiceless stops were analyzed in code-switching and English-only stories in each register. In ADS only, English voiceless stops that preceded an English-to-Spanish code-switch and were closer to that switch point were produced with more Spanish-like voice onset times than more distant tokens. This effect of distance to Spanish on English VOTs was not true for tokens that followed Spanish in ADS, or in either direction in IDS, suggesting that parents may avoid producing these acoustic cues when speaking to young children.
MODELING ADAPTABILITY MECHANISMS OF SPEECH PERCEPTION Nika Jurov
(2024) Jurov, Nika; Feldman, Naomi H.; Idsardi, William; Linguistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Speech is a complex, redundant and variable signal happening in a noisy and ever changing world. How do listeners navigate these complex auditory scenes and continuously and effortlessly understand most of the speakers around them? Studies show that listeners can quickly adapt to new situations, accents and even to distorted speech. Although prior research has established that listeners rely more on some speech cues (or also called features or dimensions) than others, it is yet not understood how listeners weight them flexibly on a moment-to-moment basis when the input might deviate from the standard speech. This thesis computationally explores flexible cue re-weighting as an adaptation mechanism using real speech corpora. The computational framework it relies on is rate distortion theory. This framework models a channel that is optimized on a trade off between distortion and rate: on the one hand, the input signal should be reconstructed with minimal error after it goes through the channel. On the other hand, the channel needs to extract parsimonious information from the incoming data. This channel can be implemented as a neural network with a beta variational auto-encoder. We use this model to show that two mechanistic components are needed for adaptation: focus and switch. We firstly show that focus on a cue mimics humans better than cue weights that simply depend on long term statistics as has been largely assumed in the prior research. And second, we show a new model that can quickly adapt and switch weighting the features depending on the input of a particular moment. This model's flexibility comes from implementing a cognitive mechanism that has been called ``selective attention" with multiple encoders. Each encoder serves as a focus on a different part of the signal. We can then choose how much to rely on each focus depending on the moment. Finally, we ask whether cue weighting is informed by being able to separate the noise from speech. To this end we adapt a feature disentanglement adversarial training from vision to disentangle speech (noise) features from noise (speech) labels. We show that although this does not give us human-like cue weighting behavior, there is an effect of disentanglement of weighting spectral information slightly more than temporal information compared to the baselines. Overall, this thesis explores adaptation computationally and offers a possible mechanistic explanation for ``selective attention'' with focus and switch mechanisms, based on rate distortion theory. It also argues that cue weighting cannot be determined solely on speech carefully articulated in laboratories or in quiet. Lastly, it explores a way to inform speech models from a cognitive angle to make the models more flexible and robust, like human speech perception is.
The Role of Age and Bilingualism on Perception of Vocoded Speech
(2020) Waked, Arifi Noman; Goupell, Matthew J; Ratner, Nan; Hearing and Speech Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
This dissertation examines the role of age and bilingualism on perception of vocoded speech in order to determine whether bilingual individuals, children, and bilingual individuals with later ages of second language acquisition show greater difficulties in vocoded speech perception. Measures of language skill and verbal inhibition were also examined in relation to vocoded speech perception. Two studies were conducted, each of which had two participant language groups: Monolingual English speakers and bilingual Spanish-English speakers. The first study also explored the role of age at the time of testing by including both monolingual and bilingual children (8-10 years), and monolingual and bilingual adults (18+ years). As such, this study included four total groups of adult and child language pairs. Participants were tested on vocoded stimuli simulating speech as perceived through an 8-channel CI in conditions of both deep (0-mm shift) and shallow (6-mm shift) insertion of the electrode array. Between testing trials, participants were trained on the more difficult, 6-mm shift condition. The second study explored the role of age of second language acquisition in native speakers of Spanish (18+ years) first exposed to English at ages ranging from 0 to 12 years. This study also included a control group of monolingual English speakers (18+ years). This study examined perception of target lexical items presented either in isolation or at the end of sentences. Stimuli in this study were either unaltered or vocoded to simulate speech as heard through an 8-channel CI at 0-mm shift. Items presented in isolation were divided into differing levels of difficulty based on frequency and neighborhood density. Target items presented at the ends of sentences were divided into differing levels of difficulty based on the degree of semantic context provided by the sentence. No effects of age at testing or age of acquisition were found. In the first study, there was also no effect of language group. All groups improved with training and showed significant improvement between pre- and post-test speech perception scores in both conditions of shift. In the second study, all participants were significantly negatively impacted by vocoding; however, bilingual participants showed greater difficulty in perception of vocoded lexical items presented in isolation relative to their monolingual peers. This group difference was not found in sentence conditions, where all participants significantly benefited from greater semantic context. From this, we can conclude that bilingual individuals can make use of semantic context to perceive vocoded speech similarly to their monolingual peers. Neither language skills nor verbal inhibition, as measured in these studies, were found to significantly impact speech perception scores in any of the tested conditions across groups.
The use of the domestic dog (Canis familiaris) as a comparative model for speech perception
(2020) Mallikarjun, Amritha; Newman, Rochelle S; Neuroscience and Cognitive Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Animals have long been used as comparative models for adult human speech perception. However, few animal models have been used to explore developmental speech perception questions. This dissertation encourages the use of domestic dogs as a behavioral model for speech perception processes. Specifically, dog models are suggested for questions about 1) the role and function of underlying processes responsible for different aspects of speech perception, and 2) the effect of language experience on speech perception processes. Chapters 2, 3, and 4 examined the contributions of auditory, attention, and linguistic processing skills to infants’ difficulties understanding speech in noise. It is not known why infants have more difficulties perceiving speech in noise, especially single-talker noise, than adults. Understanding speech in noise relies on infants’ auditory, attention, and linguistic processes. It is methodologically difficult to isolate these systems’ contributions when testing infants. To tease apart these systems, I compared dogs’ name recognition in nine- and single-talker background noise to that of infants. These studies suggest that attentional processes play a large role in infants’ difficulties in understanding speech in noise. Chapter 5 explored the reasons behind infants’ shift from a preference for vowel information (vowel bias) to consonant information (consonant bias) in word identification. This shift may occur due to language exposure, or possessing a particular lexicon size and structure. To better understand the linguistic exposure necessary for consonant bias development, I tested dogs, who have long-term linguistic exposure and a minimal vocabulary. Dogs demonstrated a vowel bias rather than a consonant bias; this suggests that a small lexicon and regular linguistic exposure, plus mature auditory processing, do not lead to consonant bias emergence. Overall, these chapters suggest that dog models can be useful for broad questions about systems underlying speech perception and about the role of language exposure in the development of certain speech perception processes. However, the studies faced limitations due to a lack of knowledge about dogs’ underlying cognitive systems and linguistic exposure. More fundamental research is necessary to characterize dogs’ linguistic exposure and to understand their auditory, attentional, and linguistic processes to ask more specific comparative research questions.
Bayesian Model of Categorical Effects in L1 and L2 Speech Processing
(2014) Kronrod, Yakov; Feldman, Naomi; Linguistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
In this dissertation I present a model that captures categorical effects in both first language (L1) and second language (L2) speech perception. In L1 perception, categorical effects range between extremely strong for consonants to nearly continuous perception of vowels. I treat the problem of speech perception as a statistical inference problem and by quantifying categoricity I obtain a unified model of both strong and weak categorical effects. In this optimal inference mechanism, the listener uses their knowledge of categories and the acoustics of the signal to infer the intended productions of the speaker. The model splits up speech variability into meaningful category variance and perceptual noise variance. The ratio of these two variances, which I call Tau, directly correlates with the degree of categorical effects for a given phoneme or continuum. By fitting the model to behavioral data from different phonemes, I show how a single parametric quantitative variation can lead to the different degrees of categorical effects seen in perception experiments with different phonemes. In L2 perception, L1 categories have been shown to exert an effect on how L2 sounds are identified and how well the listener is able to discriminate them. Various models have been developed to relate the state of L1 categories with both the initial and eventual ability to process the L2. These models largely lacked a formalized metric to measure perceptual distance, a means of making a-priori predictions of behavior for a new contrast, and a way of describing non-discrete gradient effects. In the second part of my dissertation, I apply the same computational model that I used to unify L1 categorical effects to examining L2 perception. I show that we can use the model to make the same type of predictions as other SLA models, but also provide a quantitative framework while formalizing all measures of similarity and bias. Further, I show how using this model to consider L2 learners at different stages of development we can track specific parameters of categories as they change over time, giving us a look into the actual process of L2 category development.
FROM SOUND TO MEANING: QUANTIFYING CONTEXTUAL EFFECTS IN RESOLUTION OF L2 PHONOLEXICAL AMBIGUITY
(2014) Lukianchenko, Anna; Gor, Kira; Second Language Acquisition and Application; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
In order to comprehend speech, listeners have to combine low-level phonetic information about the incoming auditory signal with higher-order contextual information. Unlike native listeners, nonnative listeners perceive speech sounds through the prism of their native language, which sometimes results in perceptual ambiguity in their second language. Across four experiments, both behavioral and electrophysiological, this dissertation provides evidence that such perceptual ambiguity causes words to become temporarily indistinguishable. To comprehend meaning, nonnative listeners disambiguate words through accessing their semantic, syntactic and morphological characteristics. Syntactic and semantic cues produce a stronger context effect than morphological cues in both native and nonnative groups. Thus, although nonnative representations may differ in that they may lack phonological specification, the mechanisms associated with the use of higher-order contextual information for meaning resolution in auditory sentence comprehension are essentially the same in the native and nonnative languages
The use of acoustic cues in phonetic perception: Effects of spectral degradation, limited bandwidth and background noise
(2011) Winn, Matthew Brandon; Chatterjee, Monita; Idsardi, William J; Hearing and Speech Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Hearing impairment, cochlear implantation, background noise and other auditory degradations result in the loss or distortion of sound information thought to be critical to speech perception. In many cases, listeners can still identify speech sounds despite degradations, but understanding of how this is accomplished is incomplete. Experiments presented here tested the hypothesis that listeners would utilize acoustic-phonetic cues differently if one or more cues were degraded by hearing impairment or simulated hearing impairment. Results supported this hypothesis for various listening conditions that are directly relevant for clinical populations. Analysis included mixed-effects logistic modeling of contributions of individual acoustic cues for various contrasts. Listeners with cochlear implants (CIs) or normal-hearing (NH) listeners in CI simulations showed increased use of acoustic cues in the temporal domain and decreased use of cues in the spectral domain for the tense/lax vowel contrast and the word-final fricative voicing contrast. For the word-initial stop voicing contrast, NH listeners made less use of voice-onset time and greater use of voice pitch in conditions that simulated high-frequency hearing impairment and/or masking noise; influence of these cues was further modulated by consonant place of articulation. A pair of experiments measured phonetic context effects for the "s/sh" contrast, replicating previously observed effects for NH listeners and generalizing them to CI listeners as well, despite known deficiencies in spectral resolution for CI listeners. For NH listeners in CI simulations, these context effects were absent or negligible. Audio-visual delivery of this experiment revealed enhanced influence of visual lip-rounding cues for CI listeners and NH listeners in CI simulations. Additionally, CI listeners demonstrated that visual cues to gender influence phonetic perception in a manner consistent with gender-related voice acoustics. All of these results suggest that listeners are able to accommodate challenging listening situations by capitalizing on the natural (multimodal) covariance in speech signals. Additionally, these results imply that there are potential differences in speech perception by NH listeners and listeners with hearing impairment that would be overlooked by traditional word recognition or consonant confusion matrix analysis.
Infant Speech-in-Noise Perception and Later Phonological Awareness: A Longitudinal Study
(2008-10-20) Stimley, Sarah Elizabeth; Newman, Rochelle; Hearing and Speech Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
While differences have been found in the ability of infants on a variety of speech perception skills including speech perception in the presence of background noise, the implications of these differences on later language skills are currently unknown. This study examines the relationship between a specific measure of infant speech perception in noise and later phonological awareness outcomes. In order to test this relationship, individuals who participated in Newman's (2005) study on infant speech perception in the presence of background noise were administered a battery of language, phonological awareness, and intelligence tests. Scores from these tests were analyzed to see if performance differences existed between those who had performed well as infants in the original study and those who had not. No significant differences between these two groups were found on the phonological awareness measures. Potential reasons for these findings and suggestions for future research are discussed.

Theses and Dissertations from UMD

Browse

Filters

Settings

Sort By

Results per page

Search Results