Ph.D. Dissertation Defense: Steven Tjoa

Thursday, May 12, 2011
11:00 a.m.
KIM 2211
Maria Hoo
301 405 3681
mch@umd.edu

ANNOUNCEMENT: Ph.D. Dissertation Defense

Name: Steven Tjoa

Committee:

Professor K. J. Ray Liu, Chair/Advisor

Professor Min Wu

Professor Rama Chellappa

Professor Jonathan Z. Simon

Professor Lawrence C. Washington, Dean's Representative

Date/Time: Thursday, May 12, 2011, 11:00 am

Location: KIM 2211

Title: Sparse and Nonnegative Factorizations for Music Understanding

Abstract:

In this dissertation, we propose methods for sparse and nonnegative factorization that are specifically suited for analyzing musical signals. First, we discuss two constraints that aid factorization of musical signals: harmonic and co-occurrence constraints. We propose a novel dictionary learning method that imposes harmonic constraints upon the atoms of the learned dictionary while allowing the dictionary size to grow appropriately during the learning procedure. When there is significant spectral-temporal overlap among the musical sources, our method outperforms popular existing matrix factorization methods as measured by the recall and precision of learned dictionary atoms. We also propose co-occurrence constraints -- three simple and convenient multiplicative update rules for NMF that enforce dependence among atoms.

Using examples in music transcription, we demonstrate the ability of these updates to represent each musical note with multiple atoms and cluster the atoms for source separation purposes.

Second, we study how spectral and temporal information extracted by nonnegative factorizations can improve upon musical instrument recognition. Musical instrument recognition in melodic signals is difficult, especially for classification systems that rely entirely upon spectral information instead of temporal information. Here, we propose a simple and effective method of combining spectral and temporal information for instrument recognition. While existing classification methods use traditional features such as statistical moments, we extract novel features from spectral and temporal atoms generated by NMF using a biologically motivated multiresolution gamma filterbank. Unlike other methods that require thresholds, safeguards, and hierarchies, the proposed spectral-temporal method requires only simple filtering and a flat classifier.

Finally, we study how to perform sparse factorization when a large dictionary of musical atoms is already known. Sparse coding methods such as matching pursuit (MP) have been applied to problems in music information retrieval such as transcription and source separation with moderate success. However, when the set of dictionary atoms is large, identification of the best match in the dictionary with the residual is slow -- linear in the size of the dictionary. Here, we propose a variant called approximate matching pursuit (AMP) that is faster than MP while maintaining scalability and accuracy. Unlike MP, AMP uses an approximate nearest-neighbor (ANN) algorithm to find the closest match in a dictionary in sublinear time. One such ANN algorithm, locality-sensitive hashing (LSH), is a probabilistic hash algorithm that places similar, yet not identical, observations into the same bin.

While the accuracy of AMP is comparable to similar MP methods, the computational complexity is reduced. Also, by using LSH, this method scales easily; the dictionary can be expanded without reorganizing any data structures.

Audience: Graduate  Faculty 

remind we with google calendar

 

April 2024

SU MO TU WE TH FR SA
31 1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 1 2 3 4
Submit an Event