Efficient Acoustic Simulation for Learning-based Virtual and Real-World Audio Processing

Loading...
Thumbnail Image

Files

Publication or External Link

Date

2022

Citation

Abstract

Sound propagation is commonly known to be air pressure perturbations due to vibrating/moving objects. The energy of sound gets attenuated by transmitting in the air over a distance and by being absorbed at other objects' surfaces. Numerous researchers have focused on devising better acoustic simulation methods to model sound propagation in a more realistic manner. The benefits of accurate acoustic simulations include but are not limited to computer-aided acoustic design, acoustic optimization, synthetic speech data generation, and immersive audio-visual rendering for mixed reality. However, acoustic simulation has been underexplored for relevant virtual and real-world audio processing applications. The main challenges in adopting accurate acoustic simulation methods include the tradeoff between accuracy and time-space cost and the difficulties in acquiring and reconstructing acoustic scenes in the real world.

In this dissertation, we propose novel methods to overcome the above challenges by leveraging the inferential power of deep neural networks, and combining them with interactive acoustic simulation techniques. First, we develop a neural network model that can learn the acoustic scattering fields of different objects given their 3D representations as the input. This works facilitates the inclusion of wave acoustic scattering effects in interactive sound rendering applications, which used to be difficult without intensive pre-computation. Second, we incorporate a deep acoustic analysis neural network into the sound rendering pipeline to allow the generation of sounds that are perceptually consistent with real-world sounds. This is achieved by predicting acoustic parameters at run-time from real-world audio samples and optimizing simulation parameters accordingly. Finally, we build a pipeline that utilizes general 3D indoor scene datasets to generate high-quality acoustic room impulse responses and demonstrate the usefulness of the generated data on several practical speech processing tasks. Our results demonstrate that by leveraging state-of-the-art physics-based acoustic simulation and deep learning techniques, realistic simulated data can be generated to enhance sound rendering quality in the virtual world and boost the performance of audio processing tasks in the real world.

Notes

Rights