We study a deep matrix factorization problem. It takes as input a matrix X obtained by multiplying K matrices (called factors). Each factor is obtained by applying a fixed linear operator to a vector of parameters satisfying a sparsity constraint. We provide sharp conditions on the structure of the model that guarantee the stable recovery of the factors from the knowledge of X and the model for the factors. This is crucial in order to interpret the factors and the intermediate features obtained when applying a few factors to a datum. When K = 1: the paper provides compressed sensing statements; K = 2 covers (for instance) Non-negative Matrix Factorization, Dictionary learning, low rank approximation, phase recovery. The particularity of this paper is to extend the study to deep problems. As an illustration, we detail the analysis and provide (entirely computable) guarantees for the stable recovery of a (non-neural) sparse convolutional network.

We study a deep matrix factorization problem. It takes as input a matrix X obtained by multiplying K matrices (called factors). Each factor is obtained by applying a fixed linear operator to a short vector of parameters satisfying a model (for instance sparsity, grouped sparsity, non-negativity, constraints defining a convolution network\ldots). We call the problem deep or multi-layer because the number of factors is not limited. In the practical situations we have in mind, we can typically have K=10 or 100. This work aims at identifying conditions on the structure of the model that guarantees the stable recovery of the factors from the knowledge of X and the model for the factors.We provide necessary and sufficient conditions for the identifiability of the factors (up to a scale rearrangement). We also provide a necessary and sufficient condition called Deep Null Space Property (because of the analogy with the usual Null Space Property in the compressed sensing framework) which guarantees that even an inaccurate optimization algorithm for the factorization stably recovers the factors. We illustrate the theory with a practical example where the deep factorization is a convolutional network.

Speech signals are complex intermingling of various informative factors, and this information blending makes decoding any of the individual factors extremely difficult. A natural idea is to factorize each speech frame into independent factors, though it turns out to be even more difficult than decoding each individual factor. A major encumbrance is that the speaker trait, a major factor in speech signals, has been suspected to be a long-term distributional pattern and so not identifiable at the frame level. In this paper, we demonstrated that the speaker factor is also a short-time spectral pattern and can be largely identified with just a few frames using a simple deep neural network (DNN). This discovery motivated a cascade deep factorization (CDF) framework that infers speech factors in a sequential way, and factors previously inferred are used as conditional variables when inferring other factors. Our experiment on an automatic emotion recognition (AER) task demonstrated that this approach can effectively factorize speech signals, and using these factors, the original speech spectrum can be recovered with high accuracy. This factorization and reconstruction approach provides a novel tool for many speech processing tasks.

Image Credit: NASA/JPL-Caltech/Space Science Institute

File name: W00110282.jpg,

https://saturn.jpl.nasa.gov/raw_images/426594Taken: Sep. 14, 2017 7:59 PM

Received: Sep. 15, 2017 7:04 AM

The camera was pointing toward SATURN, and the image was taken using the CL1 and CL2 filters. This image has not been validated or calibrated. A validated/calibrated image will be archived with the NASA Planetary Data System.