NUMML 2010 -- NIPS Workshop on Numerical Mathematics Challenges in Machine Learning

NUMML 2010

(Dec. 11, 2010)
NIPS 2010 Workshop, to be held in Whistler, Canada.

Alan Willsky

A Personal Journey: From Signals and Systems to Graphical Models
Hide / Show Abstract

This talk outlines a meandering line of research, started in 1988, that began with some signal processors and control theorists trying to make statistical sense of the emerging field of wavelet analysis and, to the speaker's surprise, moved into areas that certainly take advantage of his S&S/CT background but evolved into topics in a variety of fields involving efficient numerical methods for large-scale data assimilation and mapping and, eventually, a rapprochement with graphical models and machine learning.

The talk will touch on our early work on multiresolution tree models (motivated by wavelets but only indirectly relevant to them), the way control theorists think of inference and approximate modeling/stochastic realization, with at least one application that rings of the way a machine learning researcher might build a model – but not a mathematical physicist! I’ll provide (finally) a real rapprochement with wavelets and then turn to approximate inference on loopy graphs. The first approach builds on an idea used in the solution of PDEs, namely nested dissection, but with some machine learning twists, and some control-theoretic stability issues (showing how control might have a few things to provide to inference algorithm designers).

I will then turn to a topic again related to the ways in which numerical linear algebraists think about solving sparse systems of equations, but in the context of graphical models, this leads to the idea of walk-sum analysis, a surprisingly useful (and at least I think intuitive) idea. Walk-sum analysis then allows us to say some fairly strong things about a variety of iterative algorithms (generally known as either Richardson iterations, Jacobi iterations, or Gauss-Seidel iterations), including adaptive algorithms to guide iterations for fast convergence. Walk-sum analysis is also key in another approach with linear algebraic interpretations, involving so-called feedback vertex sets.

I will also touch on an alarmingly accurate method for computing variances in graphical models that involves using a low-rank approximation to the identity matrix(!) and then return to multiresolution models but now looking at models motivated by two quite different classes of numerical algorithms: multigrid algorithms and multipole algorithms. These algorithms motivate two quite different classes of models, the latter of which requires the introduction of what we refer to as conjugate graphs.

If I have any time and energy left, I will comment on some prospective topics.

Rick Szeliski

Multi-level preconditioners for image processing applications.

Michael Mahoney

Linear Algebra and Machine Learning of Large Informatics Graphs
Hide / Show Abstract
Very large informatics graphs such as large social and information networks typically have properties that render many popular machine learning and data analysis tools largely inappropriate. While this is problematic for these applications, it also suggests that these graphs may be useful as a test case for the development of new algorithmic tools that may then be applicable much more generally. Many of the popular machine learning and data analysis tools rely on linear algebra, and they are typically used by calling traditional numerical linear algebra code as a black box. After briefly reviewing some of the structural properties of large social and information networks that are responsible for the inapplicability of traditional linear algebra and machine learning tools, I will describe several examples of "new linear algebra" and "new machine learning" that arise from the analysis of such informatics graphs. These new directions involve looking "inside" the black box, and they place very different demands on the linear algebra than are traditionally placed by numerical, scientific computing, and small-scale machine learning applications.

Constantine Caramanis

Robust PCA and Collaborative Filtering: Rejecting Outliers, Identifying Manipulators
Hide / Show Abstract
Principal Component Analysis is one of the most widely used techniques for dimensionality reduction. Nevertheless, it is plagued by sensitivity to outliers; finding robust analogs, particularly for high-dimensional data, is critical. We discuss the challenges posed by the high dimensional setting, where dimensionality is of the same order, or greater, than the number of samples. We detail why existing techniques fail -- indeed, no known algorithm can provide provable bounds to any constant fraction of outliers -- and then present two very different algorithms for High Dimensional Robust PCA.
Our first algorithm achieves a breakdown point of 50% -- the best possible using any algorithm, and a stark improvement from the previous best-known result of 0%. Our second algorithm is based on ideas from convex optimization, and in addition to recovering the principal components, is also able to identify the corrupted points. We extend this to the partially observed setting, significantly extending matrix completion results to the setting of corrupted rows or columns.

Dan Kushnir

Algebraic Multigrid Methods with Applications to Data Analysis
Hide / Show Abstract
Multigrid solvers for solving linear problems (structured or unstructured) are based on iterating between two processes:

Classical relaxation schemes, which are generally slow to converge but fast to ``smooth'' the error function.

Approximating the ``smooth'' error function on a coarser level (a coarser grid or, more generally, a smaller graph, typically having one quarter to one half the number of variables). This is done by solving coarse-level equations derived from the fine-level system and the residuals of its current approximate solution, and then interpolating that coarse-level solution to correct the fine-level approximation. The solution of the coarse level equations is obtained by using recursively the same two processes, employing still coarser levels.

The most basic Multigrid solvers are the Geometric solvers (GMG) in which the problem is defined on a geometric grid. Of more interest, in the context of Machine learning, are the Algebraic Multigrid (AMG) solvers designed for unstructured problems. In this tutorial I will introduce the AMG technique, and its adoption for solving numerical problems in the context of Data Analysis. In particular, I will describe several efficient AMG eigensolvers that can be used within spectral graph methods such clustering, image segmentation, and dimensionality reduction. It turns out that the Algebraic Multigrid coarsening procedure itself can yield efficient and accurate algorithms for data clustering and image segmentation without any explicit solution of the related equations

Stefan Harmeling

Efficient space-variant blind deconvolution
Hide / Show Abstract

Blur in photos due to camera shake, blur in astronomical image sequences due to atmospheric turbulence, and blur in magnetic resonance imaging sequences due to object motion are examples of blur that can not be adequately described as a space-invariant convolution, because such blur varies across the image. In this talk, we present a class of linear transformations, that are expressive enough for space-variant blurs, but at the same time especially designed for efficient matrix-vector-multiplications. Successful results on the above-mentioned examples demonstrate the practical significance of our approach.

Invited Talks

NUMML 2010

Alan Willsky

A Personal Journey: From Signals and Systems to Graphical Models

Rick Szeliski

Multi-level preconditioners for image processing applications.

Michael Mahoney

Linear Algebra and Machine Learning of Large Informatics Graphs

Constantine Caramanis

Robust PCA and Collaborative Filtering: Rejecting Outliers, Identifying Manipulators

Dan Kushnir

Algebraic Multigrid Methods with Applications to Data Analysis

Stefan Harmeling

Efficient space-variant blind deconvolution

Categories