Sparse and Nonnegative Factorizations For Music Understanding

dc.contributor.advisorLiu, K. J. Rayen_US
dc.contributor.authorTjoa, Steven Kiemyangen_US
dc.contributor.departmentElectrical Engineeringen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2011-10-08T06:35:54Z
dc.date.available2011-10-08T06:35:54Z
dc.date.issued2011en_US
dc.description.abstractIn this dissertation, we propose methods for sparse and nonnegative factorization that are specifically suited for analyzing musical signals. First, we discuss two constraints that aid factorization of musical signals: harmonic and co-occurrence constraints. We propose a novel dictionary learning method that imposes harmonic constraints upon the atoms of the learned dictionary while allowing the dictionary size to grow appropriately during the learning procedure. When there is significant spectral-temporal overlap among the musical sources, our method outperforms popular existing matrix factorization methods as measured by the recall and precision of learned dictionary atoms. We also propose co-occurrence constraints -- three simple and convenient multiplicative update rules for nonnegative matrix factorization (NMF) that enforce dependence among atoms. Using examples in music transcription, we demonstrate the ability of these updates to represent each musical note with multiple atoms and cluster the atoms for source separation purposes. Second, we study how spectral and temporal information extracted by nonnegative factorizations can improve upon musical instrument recognition. Musical instrument recognition in melodic signals is difficult, especially for classification systems that rely entirely upon spectral information instead of temporal information. Here, we propose a simple and effective method of combining spectral and temporal information for instrument recognition. While existing classification methods use traditional features such as statistical moments, we extract novel features from spectral and temporal atoms generated by NMF using a biologically motivated multiresolution gamma filterbank. Unlike other methods that require thresholds, safeguards, and hierarchies, the proposed spectral-temporal method requires only simple filtering and a flat classifier. Finally, we study how to perform sparse factorization when a large dictionary of musical atoms is already known. Sparse coding methods such as matching pursuit (MP) have been applied to problems in music information retrieval such as transcription and source separation with moderate success. However, when the set of dictionary atoms is large, identification of the best match in the dictionary with the residual is slow -- linear in the size of the dictionary. Here, we propose a variant called approximate matching pursuit (AMP) that is faster than MP while maintaining scalability and accuracy. Unlike MP, AMP uses an approximate nearest-neighbor (ANN) algorithm to find the closest match in a dictionary in sublinear time. One such ANN algorithm, locality-sensitive hashing (LSH), is a probabilistic hash algorithm that places similar, yet not identical, observations into the same bin. While the accuracy of AMP is comparable to similar MP methods, the computational complexity is reduced. Also, by using LSH, this method scales easily; the dictionary can be expanded without reorganizing any data structures.en_US
dc.identifier.urihttp://hdl.handle.net/1903/12072
dc.subject.pqcontrolledElectrical engineeringen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pqcontrolledMusicen_US
dc.subject.pquncontrolledDictionary Learningen_US
dc.subject.pquncontrolledMusic Information Retrievalen_US
dc.subject.pquncontrolledMusic Transcriptionen_US
dc.subject.pquncontrolledNonnegative Matrix Factorizationen_US
dc.subject.pquncontrolledSource Separationen_US
dc.subject.pquncontrolledSparse Codingen_US
dc.titleSparse and Nonnegative Factorizations For Music Understandingen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Tjoa_umd_0117E_12349.pdf
Size:
6.18 MB
Format:
Adobe Portable Document Format