UMD Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/3

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a given thesis/dissertation in DRUM.

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 7 of 7
  • Thumbnail Image
    Item
    EFFECTS OF AGE ON CONTEXT BENEFIT FOR UNDERSTANDING COCHLEAR-IMPLANT PROCESSED SPEECH
    (2024) Tinnemore, Anna; Gordon-Salant, Sandra; Goupell, Matthew J; Neuroscience and Cognitive Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The number of people over 65 years old in the United States is rapidly growing as the generation known as “Baby Boomers” reaches this milestone. Currently, at least 16 million of these older adults struggle to communicate effectively because of disabling hearing loss. An increasing number of older adults with hearing loss are electing to receive a cochlear implant (CI) to partially restore their ability to communicate effectively. CIs provide access to speech information, albeit in a highly degraded form. This degradation can frequently make individual words unclear. While predictive sentence contexts can often be used to resolve individual unclear words, there are many factors that either enhance or diminish the benefit of sentence contexts. This dissertation presents three complementary studies designed to address some of these factors, specifically: (1) the location of the unclear word in the context sentence, (2) how much background noise is present, and (3) individual factors such as age and hearing loss. The first study assessed the effect of context for adult listeners with acoustic hearing when a target word is presented in different levels of background noise at the beginning or end of sentences that vary in predictive context. Both context sentences and target words were spectrally degraded as a simulation of sound processed by a CI. The second study evaluated how listeners with CIs use context under the same conditions of background noise, sentence position, and predictive contexts as the group with acoustic hearing. The third study used eye-tracking methodology to infer information about the real-time processing of degraded speech across ages in a group of people who had acoustic hearing and a group of people who used CIs. Results from these studies indicate that target words at the beginning of the context sentence are more likely to be interpreted to be consistent with the following context sentence than target words at the end of the context sentences. In addition, the age of the listener interacted with some of the other experimental variables to predict phoneme categorization performance and response times in both listener groups. In the study of real-time language processing, there were no significant differences in the gaze trajectories between listeners with CIs and listeners with acoustic hearing. Together, these studies confirm that older listeners can use context in a manner similar to younger listeners, although at a slower speed. These studies expand the field’s knowledge of the importance of an unclear word’s location within a sentence and draw attention to the strategies employed by individual listeners to use context. The results of these experiments provide vital data needed to assess the current usage of context in the aging population with CIs and to develop age-specific auditory rehabilitation efforts for improved communication.
  • Thumbnail Image
    Item
    The Learning and Usage of Second Language Speech Sounds: A Computational and Neural Approach
    (2023) Thorburn, Craig Adam; Feldman, Naomi H; Linguistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Language learners need to map a continuous, multidimensional acoustic signal to discrete abstract speech categories. The complexity of this mapping poses a difficult learning problem, particularly for second language learners who struggle to acquire the speech sounds of a non-native language, and almost never reach native-like ability. A common example used to illustrate this phenomenon is the distinction between /r/ and /l/ (Goto, 1971). While these sounds are distinct in English and native English speakers easily distinguish the two sounds, native Japanese speakers find this difficult, as the sounds are not contrastive in their language. Even with much explicit training, Japanese speakers do not seem to be able to reach native-like ability (Logan, Lively, & Pisoni, 1991; Lively, Logan & Pisoni, 1993) In this dissertation, I closely explore the mechanisms and computations that underlie effective second-language speech sound learning. I study a case of particularly effective learning--- a video game paradigm where non-native speech sounds have functional significance (Lim & Holt, 2011). I discuss the relationship with a Dual Systems Model of auditory category learning and extend this model, bringing it together with the idea of perceptual space learning from infant phonetic learning. In doing this, I describe why different category types are better learned in different experimental paradigms and when different neural circuits are engaged. I propose a novel split where different learning systems are able to update different stages of the acoustic-phonetic mapping from speech to abstract categories. To do this I formalize the video game paradigm computationally and implement a deep reinforcement learning network to map between environmental input and actions. In addition, I study how these categories could be used during online processing through an MEG study where second-language learners of English listen to continuous naturalistic speech. I show that despite the challenges of speech sound learning, second language listeners are able to predict upcoming material integrating different levels of contextual information and show similar responses to native English speakers. I discuss the implications of these findings and how the could be integrated with literature on the nature of speech representation in a second language.
  • Thumbnail Image
    Item
    Effects of talker familiarity on speech understanding and cognitive effort in complex environments.
    (2020) Cohen, Julie; Gordon-Salant, Sandra; Brungart, Douglas S.; Hearing and Speech Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The long-term goal of this project is to understand the cognitive mechanisms responsible for familiar voice (FV) benefit in real-world environments, and to develop means to exploit the FV benefit to increase saliency of attended speech for older adults with hearing loss. Older adults and those with hearing loss have greater difficulty in noisy environments than younger adults, due in part to a reduction in available cognitive resources. When older listeners are in a challenging environment, their reduced cognitive resources (i.e., working memory and inhibitory control) can result in increased listening effort to maintain speech understanding performance. Both younger and older listeners were tested in this study to determine if the familiar voice benefit varies with listener age under various listening conditions. Study 1 examined whether a FV improves speech understanding and working memory during a dynamic speech understanding task in a real-world setting for couples of younger and older adults. Results showed that both younger and older adults exhibited a talker familiarity benefit to speech understanding performance, but performance on a test of working memory capacity did not vary as a function of talker familiarity. Study 2 examined if a FV improves speech understanding in a simulated cocktail-party environment in a lab setting by presenting multi-talker stimuli that were either monotic or dichotic. Both YNH and ONH groups exhibited a familiarity benefit in monotic and dichotic listening conditions. However, results also showed that talker familiarity benefit in the monotic conditions varied as a function of talker identification accuracy. When the talker identification was correct, speech understanding was similar when listening to a familiar masker or when both voices were unfamiliar. However, when talker identification was incorrect, listening to a familiar masker resulted in a decline in speech understanding. Study 3 examined if a FV improves performance on a measure of auditory working memory. ONH listeners with higher working memory capacity exhibited a benefit in performance when listening to a familiar vs. unfamiliar target voice. Additionally, performance on the 1-back test varied as a function of working memory capacity and inhibitory control. Taken together, talker familiarity is a beneficial cue that both younger and older adults can utilize when listening in complex environments, such as a restaurant or a crowded gathering. Listening to a familiar voice can improve speech understanding in noise, particularly when the noise is composed of speech. However, this benefit did not impact performance on a high memory load task. Understanding the role that familiar voices may have on the allocation of cognitive resources could result in improved aural rehabilitation strategies and may ultimately facilitate improvements in partner communication in complex real-world environments.
  • Thumbnail Image
    Item
    Age Effects on Perceptual Organization of Speech in Realistic Environments
    (2017) Bologna, William Joseph; Dubno, Judy R; Gordon-Salant, Sandra; Hearing and Speech Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Communication often occurs in environments where background sounds fluctuate and mask portions of the intended message. Listeners use envelope and periodicity cues to group together audible glimpses of speech and fill in missing information. When the background contains other talkers, listeners also use focused attention to select the appropriate target talker and ignore competing talkers. Whereas older adults are known to experience significantly more difficulty with these challenging tasks than younger adults, the sources of these difficulties remain unclear. In this project, three related experiments explored the effects of aging on several aspects of speech understanding in realistic listening environments. Experiments 1 and 2 determined the extent to which aging affects the benefit of envelope and periodicity cues for recognition of short glimpses of speech, phonemic restoration of missing speech segments, and/or segregation of glimpses with a competing talker. Experiment 3 investigated effects of age on the ability to focus attention on an expected voice in a two-talker environment. Twenty younger adults and 20 older adults with normal hearing participated in all three experiments and also completed a battery of cognitive measures to examine contributions from specific cognitive abilities to speech recognition. Keyword recognition and cognitive data were analyzed with an item-level logistic regression based on a generalized linear mixed model. Results indicated that older adults were poorer than younger adults at glimpsing short segments of speech but were able use envelope and periodicity cues to facilitate phonemic restoration and speech segregation. Whereas older adults performed poorer than younger adults overall, these groups did not differ in their ability to focus attention on an expected voice. Across all three experiments, older adults were poorer than younger adults at recognizing speech from a female talker both in quiet and with a competing talker. Results of cognitive tasks indicated that faster processing speed and better visual-linguistic closure were predictive of better speech understanding. Taken together these results suggest that age-related declines in speech recognition may be partially explained by difficulty grouping short glimpses of speech into a coherent message, which may be particularly difficult for older adults when the talker is female.
  • Thumbnail Image
    Item
    Infants' Ability to Learn New Words Across Accent
    (2011) Panza, Sabrina; Newman, Rochelle; Hearing and Speech Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The purpose of this study was to explore the phonetic flexibility of toddlers' early lexical representations. In this study (based on Schmale, et al., 2011), toddlers' ability to generalize newly learned words across speaker accent was measured using a split-screen preferential looking paradigm. Twenty-four toddlers (mean age = 29 months) were taught two new words by a Spanish-accented speaker and later tested by a native English speaker. One word had a phonological (vocalic) change across speaker accent (e.g., [fim]/[feem]), while the other word did not (e.g., [mef]/[mef]). Toddlers looked to the correct object significantly longer than chance only when the target label did not phonemically differ across accent. However, toddlers did not look longer to the non-phonemic target variant than the phonemic variant. High variability between subjects was noted and the potential need for additional exposure prior to testing infants on such a contrast is discussed.
  • Thumbnail Image
    Item
    Temporal dynamics of MEG phase information during speech perception: Segmentation and neural communication using mutual information and phase locking
    (2011) Cogan, Gregory Brendan; Idsardi, William; Neuroscience and Cognitive Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The incoming speech stream contains a rich amount of temporal information. In particular, information on slow time scales, the delta and theta band (125 - 1000 ms, 1 - 8 Hz), corresponds to prosodic and syllabic information while information on faster time scales (20-40 ms, 25 - 50 Hz) corresponds to feature/phonemic information. In order for speech perception to occur, this signal must be segregated into meaningful units of analysis and then processed in a distributed network of brain regions. Recent evidence suggests that low frequency phase information in the delta and theta bands of the Magnetoencephalography (MEG) signal plays an important role for tracking and segmenting the incoming signal into units of analysis. This thesis utilized a novel method of analysis, Mutual Information (MI) to characterize the relative information contributions of these low frequency phases. Reliable information pertaining to the stimulus was present in both delta and theta bands (3 - 5 Hz, 5 - 7 Hz) and information within each of these three sub-bands was independent of each other. A second experiment demonstrated that the information present in these bands differed significantly for speech and a non-speech control condition, suggesting that contrary to previous results, a purely acoustic hypothesis of this segmentation is not supported. A third experiment found that both low (delta and theta) and high (gamma) frequency information is utilized to facilitate communication between brain areas thought to underlie speech perception. Distinct auditory/speech networks that operated exclusively using these frequencies were revealed, suggesting a privileged role for these timescales for neural communication between brain regions. Taken together these results suggest that timescales that correspond linguistically to important aspects of the speech stream also facilitate segmentation of the incoming signal and communication between brain areas that perform neural computation.
  • Thumbnail Image
    Item
    On The Way To Linguistic Representation: Neuromagnetic Evidence of Early Auditory Abstraction in the Perception of Speech and Pitch
    (2009) Monahan, Philip Joseph; Idsardi, William J; Poeppel, David E; Linguistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The goal of this dissertation is to show that even at the earliest (non-invasive) recordable stages of auditory cortical processing, we find evidence that cortex is calculating abstract representations from the acoustic signal. Looking across two distinct domains (inferential pitch perception and vowel normalization), I present evidence demonstrating that the M100, an automatic evoked neuromagnetic component that localizes to primary auditory cortex is sensitive to abstract computations. The M100 typically responds to physical properties of the stimulus in auditory and speech perception and integrates only over the first 25 to 40 ms of stimulus onset, providing a reliable dependent measure that allows us to tap into early stages of auditory cortical processing. In Chapter 2, I briefly present the episodicist position on speech perception and discuss research indicating that the strongest episodicist position is untenable. I then review findings from the mismatch negativity literature, where proposals have been made that the MMN allows access into linguistic representations supported by auditory cortex. Finally, I conclude the Chapter with a discussion of the previous findings on the M100/N1. In Chapter 3, I present neuromagnetic data showing that the re-sponse properties of the M100 are sensitive to the missing fundamental component using well-controlled stimuli. These findings suggest that listeners are reconstructing the inferred pitch by 100 ms after stimulus onset. In Chapter 4, I propose a novel formant ratio algorithm in which the third formant (F3) is the normalizing factor. The goal of formant ratio proposals is to provide an explicit algorithm that successfully "eliminates" speaker-dependent acoustic variation of auditory vowel tokens. Results from two MEG experiments suggest that auditory cortex is sensitive to formant ratios and that the perceptual system shows heightened sensitivity to tokens located in more densely populated regions of the vowel space. In Chapter 5, I report MEG results that suggest early auditory cortical processing is sensitive to violations of a phonological constraint on sound sequencing, suggesting that listeners make highly specific, knowledge-based predictions about rather abstract anticipated properties of the upcoming speech signal and violations of these predictions are evident in early cortical processing.