An important problem in data analysis tasks is to find suitable representations that make hidden structure in the data explicit. Non-negative data, which is the focus of this talk, occurs is many application domains such as audio (energy in spectrograms) and image analysis (pixel intensities). Non-negative Matrix Factorization (NMF), non-negative ICA/PCA are some of the basis decomposition techniques that have been used in this context for two-dimensional data. However, they are not effective at analyzing complex structure within data, such as local repetition of transformations of smaller patterns, and do not scale for multi-dimensional data. They are also generally not amenable to the incorporation of prior statistical knowledge about the data during analysis.
In this talk, we present a general framework of probabilistic latent variable models for analyzing non-negative data of arbitrary dimensions. Data are modeled as histograms from multivariate probability distributions over the support (rather than the value) of the data. Rather than explicitly modeling the data, we attempt to characterize these underlying distributions. The distributions are modeled as mixtures of components indexed by a latent variable. The components can be interpreted as being composed of "bases" or "kernels" and corresponding "mixing proportions". The probabilistic framework allows us to derive efficient inference algorithms for estimating model parameters. The statistical foundation also enables us to extend the model to allow multi-dimensional kernels and transformations of kernels such as shifts and rotations.
The accuracy of the decomposition increases with the number of components but the "expressiveness" or "information" of components is better if they are lesser in number. To balance this trade-off, we use a sparse-overcomplete representation scheme where a large population of basis vectors is used to express the input space but only a few are required to describe any particular data item. We use entropy as a sparsity metric and the statistical foundation of the framework allows us to impose sparsity on any distribution in the model by using an entropic prior and MAP estimation.
We discuss the utility of this framework by focusing on single-channel audio processing applications. Time-frequency representations of acoustic signals can be analyzed by the models to extract structure characteristic to the sounds. This can be used in a supervised setting for applications such as source separation and in semi-supervised setting for denoising. Unlike approaches based on time-frequency masks that reconstruct partial spectral descriptions of sources by identifying time-frequency bins in which a source dominates, this approach reconstructs entire spectral descriptions of all sources. We present some example results and finally, briefly talk about some ongoing work and future research directions.
See other events that are part of Brains & Machines Seminar Series 2007
See other events happening in May 2007