Fast Numerical and Machine Learning Algorithms for Spatial Audio Reproduction

Luo, Yuancheng

Fast Numerical and Machine Learning Algorithms for Spatial Audio Reproduction

dc.contributor.advisor	Duraiswami, Ramani	en_US
dc.contributor.author	Luo, Yuancheng	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2014-10-11T05:51:40Z
dc.date.available	2014-10-11T05:51:40Z
dc.date.issued	2014	en_US
dc.description.abstract	Audio reproduction technologies have underwent several revolutions from a purely mechanical, to electromagnetic, and into a digital process. These changes have resulted in steady improvements in the objective qualities of sound capture/playback on increasingly portable devices. However, most mobile playback devices remove important spatial-directional components of externalized sound which are natural to the subjective experience of human hearing. Fortunately, the missing spatial-directional parts can be integrated back into audio through a combination of computational methods and physical knowledge of how sound scatters off of the listener's anthropometry in the sound-field. The former employs signal processing techniques for rendering the sound-field. The latter employs approximations of the sound-field through the measurement of so-called Head-Related Impulse Responses/Transfer Functions (HRIRs/HRTFs). This dissertation develops several numerical and machine learning algorithms for accelerating and personalizing spatial audio reproduction in light of available mobile computing power. First, spatial audio synthesis between a sound-source and sound-field requires fast convolution algorithms between the audio-stream and the HRIRs. We introduce a novel sparse decomposition algorithm for HRIRs based on non-negative matrix factorization that allows for faster time-domain convolution than frequency-domain fast-Fourier-transform variants. Second, the full sound-field over the spherical coordinate domain must be efficiently approximated from a finite collection of HRTFs. We develop a joint spatial-frequency covariance model for Gaussian process regression (GPR) and sparse-GPR methods that supports the fast interpolation and data fusion of HRTFs across multiple data-sets. Third, the direct measurement of HRTFs requires specialized equipment that is unsuited for widespread acquisition. We ``bootstrap'' the human ability to localize sound in listening tests with Gaussian process active-learning techniques over graphical user interfaces that allows the listener to infer his/her own HRTFs. Experiments are conducted on publicly available HRTF datasets and human listeners.	en_US
dc.identifier	https://doi.org/10.13016/M2D885
dc.identifier.uri	http://hdl.handle.net/1903/15784
dc.language.iso	en	en_US
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pqcontrolled	Computer engineering	en_US
dc.subject.pqcontrolled	Mathematics	en_US
dc.subject.pquncontrolled	Active-Learning	en_US
dc.subject.pquncontrolled	Gaussian Processes	en_US
dc.subject.pquncontrolled	Head-related Transfer Functions	en_US
dc.subject.pquncontrolled	Non-negative least squares	en_US
dc.subject.pquncontrolled	Non-negative Matrix Factorization	en_US
dc.subject.pquncontrolled	Sound-source Localization	en_US
dc.title	Fast Numerical and Machine Learning Algorithms for Spatial Audio Reproduction	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Luo_umd_0117E_15521.pdf
Size:: 8.58 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations