AVISARME: Audio Visual Synchronization Algorithm for a Robotic Musician Ensemble
Berman, David Ross
MetadataShow full item record
This thesis presents a beat detection algorithm which combines both audio and visual inputs to synchronize a robotic musician to its human counterpart. Although there has been considerable work done to create sophisticated methods for audio beat detection, the visual aspect of musicianship has been largely ignored. With advancements in image processing techniques, as well as both computer and imaging technologies, it has recently become feasible to integrate visual inputs into beat detection algorithms. Additionally, the proposed method for audio tempo detection also attempts to solve many issues that are present in current algorithms. Current audio-only algorithms have imperfections, whether they are inaccurate, too computationally expensive, or suffer from terrible resolution. Through further experimental testing on both a popular music database and simulated music signals, the proposed algorithm performed statistically better in both accuracy and robustness than the baseline approaches. Furthermore, the proposed approach is extremely efficient, taking only 45ms to compute on a 2.5s signal, and maintains an extremely high temporal resolution of 0.125 BPM. The visual integration also relies on Full Scene Tracking, allowing it to be utilized for live beat detection for practically all musicians and instruments. Numerous optimization techniques have been implemented, such as pyramidal optimization (PO) and clustering techniques which are presented in this thesis. A Temporal Difference Learning approach to sensor fusion and beat synchronization is also proposed and tested thoroughly. This TD learning algorithm implements a novel policy switching criterion which provides a stable, yet quickly reacting estimation of tempo. The proposed algorithm has been implemented and tested on a robotic drummer to verify the validity of the approach. The results from testing are documented in great detail and compared with previously proposed approaches.