Fast Numerical and Machine Learning Algorithms for Spatial Audio Reproduction

dc.contributor.advisorDuraiswami, Ramanien_US
dc.contributor.authorLuo, Yuanchengen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2014-10-11T05:51:40Z
dc.date.available2014-10-11T05:51:40Z
dc.date.issued2014en_US
dc.description.abstractAudio reproduction technologies have underwent several revolutions from a purely mechanical, to electromagnetic, and into a digital process. These changes have resulted in steady improvements in the objective qualities of sound capture/playback on increasingly portable devices. However, most mobile playback devices remove important spatial-directional components of externalized sound which are natural to the subjective experience of human hearing. Fortunately, the missing spatial-directional parts can be integrated back into audio through a combination of computational methods and physical knowledge of how sound scatters off of the listener's anthropometry in the sound-field. The former employs signal processing techniques for rendering the sound-field. The latter employs approximations of the sound-field through the measurement of so-called Head-Related Impulse Responses/Transfer Functions (HRIRs/HRTFs). This dissertation develops several numerical and machine learning algorithms for accelerating and personalizing spatial audio reproduction in light of available mobile computing power. First, spatial audio synthesis between a sound-source and sound-field requires fast convolution algorithms between the audio-stream and the HRIRs. We introduce a novel sparse decomposition algorithm for HRIRs based on non-negative matrix factorization that allows for faster time-domain convolution than frequency-domain fast-Fourier-transform variants. Second, the full sound-field over the spherical coordinate domain must be efficiently approximated from a finite collection of HRTFs. We develop a joint spatial-frequency covariance model for Gaussian process regression (GPR) and sparse-GPR methods that supports the fast interpolation and data fusion of HRTFs across multiple data-sets. Third, the direct measurement of HRTFs requires specialized equipment that is unsuited for widespread acquisition. We ``bootstrap'' the human ability to localize sound in listening tests with Gaussian process active-learning techniques over graphical user interfaces that allows the listener to infer his/her own HRTFs. Experiments are conducted on publicly available HRTF datasets and human listeners.en_US
dc.identifierhttps://doi.org/10.13016/M2D885
dc.identifier.urihttp://hdl.handle.net/1903/15784
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pqcontrolledComputer engineeringen_US
dc.subject.pqcontrolledMathematicsen_US
dc.subject.pquncontrolledActive-Learningen_US
dc.subject.pquncontrolledGaussian Processesen_US
dc.subject.pquncontrolledHead-related Transfer Functionsen_US
dc.subject.pquncontrolledNon-negative least squaresen_US
dc.subject.pquncontrolledNon-negative Matrix Factorizationen_US
dc.subject.pquncontrolledSound-source Localizationen_US
dc.titleFast Numerical and Machine Learning Algorithms for Spatial Audio Reproductionen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Luo_umd_0117E_15521.pdf
Size:
8.58 MB
Format:
Adobe Portable Document Format