Fast Numerical and Machine Learning Algorithms for Spatial Audio Reproduction
dc.contributor.advisor | Duraiswami, Ramani | en_US |
dc.contributor.author | Luo, Yuancheng | en_US |
dc.contributor.department | Computer Science | en_US |
dc.contributor.publisher | Digital Repository at the University of Maryland | en_US |
dc.contributor.publisher | University of Maryland (College Park, Md.) | en_US |
dc.date.accessioned | 2014-10-11T05:51:40Z | |
dc.date.available | 2014-10-11T05:51:40Z | |
dc.date.issued | 2014 | en_US |
dc.description.abstract | Audio reproduction technologies have underwent several revolutions from a purely mechanical, to electromagnetic, and into a digital process. These changes have resulted in steady improvements in the objective qualities of sound capture/playback on increasingly portable devices. However, most mobile playback devices remove important spatial-directional components of externalized sound which are natural to the subjective experience of human hearing. Fortunately, the missing spatial-directional parts can be integrated back into audio through a combination of computational methods and physical knowledge of how sound scatters off of the listener's anthropometry in the sound-field. The former employs signal processing techniques for rendering the sound-field. The latter employs approximations of the sound-field through the measurement of so-called Head-Related Impulse Responses/Transfer Functions (HRIRs/HRTFs). This dissertation develops several numerical and machine learning algorithms for accelerating and personalizing spatial audio reproduction in light of available mobile computing power. First, spatial audio synthesis between a sound-source and sound-field requires fast convolution algorithms between the audio-stream and the HRIRs. We introduce a novel sparse decomposition algorithm for HRIRs based on non-negative matrix factorization that allows for faster time-domain convolution than frequency-domain fast-Fourier-transform variants. Second, the full sound-field over the spherical coordinate domain must be efficiently approximated from a finite collection of HRTFs. We develop a joint spatial-frequency covariance model for Gaussian process regression (GPR) and sparse-GPR methods that supports the fast interpolation and data fusion of HRTFs across multiple data-sets. Third, the direct measurement of HRTFs requires specialized equipment that is unsuited for widespread acquisition. We ``bootstrap'' the human ability to localize sound in listening tests with Gaussian process active-learning techniques over graphical user interfaces that allows the listener to infer his/her own HRTFs. Experiments are conducted on publicly available HRTF datasets and human listeners. | en_US |
dc.identifier | https://doi.org/10.13016/M2D885 | |
dc.identifier.uri | http://hdl.handle.net/1903/15784 | |
dc.language.iso | en | en_US |
dc.subject.pqcontrolled | Computer science | en_US |
dc.subject.pqcontrolled | Computer engineering | en_US |
dc.subject.pqcontrolled | Mathematics | en_US |
dc.subject.pquncontrolled | Active-Learning | en_US |
dc.subject.pquncontrolled | Gaussian Processes | en_US |
dc.subject.pquncontrolled | Head-related Transfer Functions | en_US |
dc.subject.pquncontrolled | Non-negative least squares | en_US |
dc.subject.pquncontrolled | Non-negative Matrix Factorization | en_US |
dc.subject.pquncontrolled | Sound-source Localization | en_US |
dc.title | Fast Numerical and Machine Learning Algorithms for Spatial Audio Reproduction | en_US |
dc.type | Dissertation | en_US |
Files
Original bundle
1 - 1 of 1