Efficient Acoustic Simulation for Learning-based Virtual and Real-World Audio Processing

Tang, Zhenyu

Efficient Acoustic Simulation for Learning-based Virtual and Real-World Audio Processing

Files

Tang_umd_0117E_22336.pdf (22.92 MB)

No. of downloads: 160

Date

2022

Authors

Tang, Zhenyu

Advisor

Manocha, Dinesh

DRUM DOI

https://doi.org/10.13016/ggfz-e9k2

Abstract

Sound propagation is commonly known to be air pressure perturbations due to vibrating/moving objects. The energy of sound gets attenuated by transmitting in the air over a distance and by being absorbed at other objects' surfaces. Numerous researchers have focused on devising better acoustic simulation methods to model sound propagation in a more realistic manner. The benefits of accurate acoustic simulations include but are not limited to computer-aided acoustic design, acoustic optimization, synthetic speech data generation, and immersive audio-visual rendering for mixed reality. However, acoustic simulation has been underexplored for relevant virtual and real-world audio processing applications. The main challenges in adopting accurate acoustic simulation methods include the tradeoff between accuracy and time-space cost and the difficulties in acquiring and reconstructing acoustic scenes in the real world.

In this dissertation, we propose novel methods to overcome the above challenges by leveraging the inferential power of deep neural networks, and combining them with interactive acoustic simulation techniques. First, we develop a neural network model that can learn the acoustic scattering fields of different objects given their 3D representations as the input. This works facilitates the inclusion of wave acoustic scattering effects in interactive sound rendering applications, which used to be difficult without intensive pre-computation. Second, we incorporate a deep acoustic analysis neural network into the sound rendering pipeline to allow the generation of sounds that are perceptually consistent with real-world sounds. This is achieved by predicting acoustic parameters at run-time from real-world audio samples and optimizing simulation parameters accordingly. Finally, we build a pipeline that utilizes general 3D indoor scene datasets to generate high-quality acoustic room impulse responses and demonstrate the usefulness of the generated data on several practical speech processing tasks. Our results demonstrate that by leveraging state-of-the-art physics-based acoustic simulation and deep learning techniques, realistic simulated data can be generated to enhance sound rendering quality in the virtual world and boost the performance of audio processing tasks in the real world.

URI (handle)

http://hdl.handle.net/1903/28949

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations

Full item page