Neural and computational approaches to auditory scene analysis

Thumbnail Image
Publication or External Link
Akram, Sahar
Shamma, Shihab A
Our perception of the world is highly dependent on the complex processing of the sensory inputs by the brain. Hearing is one of those seemingly effortless sensory tasks that enables us to perceive the auditory world and integrate acoustic information from the environment into cognitive experiences. The main purpose of studying auditory system is to shed light on the neural mechanisms underlying our hearing ability. Understanding the systematic approach of the brain in performing such complicated tasks is an ultimate goal with numerous clinical and intellectual applications. In this thesis, we take advantage of various experimental and computational approaches to understand the functionality of the brain in analyzing complex auditory scenes. We first focus on investigating the behavioral and neural mechanisms underlying auditory sound segregation, also known as auditory streaming. Employing an informational masking paradigm, we explore the interaction between stimulus-driven and task-driven attentional process in the auditory cortex using magnetoencephalography (MEG) recordings from the human brain. The results demonstrate close links between perceptual and neural consequences of the auditory stream segregation, suggesting the neural activity to be viewed as an indicator of the auditory streaming percept. We examine more realistic auditory scenarios consisted of two speakers simultaneously present in an auditory scene and introduce a novel computational approach for decoding the attentional state of listeners in such environment. The proposed model focuses on an efficient implementation of a decoder for tracking the cognitive state of the brain, inspired from neural representation of auditory objects in the auditory cortex. The structure is based on an state-space model with the recorded MEG signal and individual speech envelopes as the input and the probability of attending to the target speaker as the output of the model. The proposed approach benefits from accurate and highly resolved estimation of attentional state in time as well as the inherent model-based dynamic denoising of the underlying state-space model, which makes it possible to reliably decode the attentional state under very low SNR conditions. As part of this research work, we investigate the neural representation of ambiguous auditory stimuli at the level of the auditory cortex. In perceiving a typical auditory scene, we may receive incomplete or ambiguous auditory information from the environment. This can lead to multiple interpretations of the same acoustic scene and formation of an ambitious perceptual state in the brain. Here, in a series of experimental studies, we focus on a particular example of ambitious stimulus (ambitious Shepard tone pair) and investigate the neural correlates of the contextual effect and perceptual biasing using MEG. The results from psychoacoustic and neural recordings suggest a set of hypothesis about the underlying neural mechanism of short-term memory and expectation modulation in the nervous system.