Workshop on kernel and sampling methods for design and quantization

When & where
Remote participation
Presentation of the workshop
Registration
Agenda
Speakers
Abstracts

When & where

The workshop will (hopefully) take place IRL, on November 15, 2021 at Institut Henri Poincaré, Paris.

Room: amphithéâtre Hermite (ground floor).

Remote participation

Remote participation will be possible as well, using Zoom.

The link will be sent a few days before the workshop to all registered participants.

Presentation of the workshop

The recent period has seen a considerable increase in the use of kernel methods in various fields of applied mathematics and statistics, with a flourishing expansion of applications in many areas of engineering. This includes in particular, but not limited to, function approximation and integration, and covers many topics of interest to the MASCOT-NUM network. In recent years, minimisation of a kernel discrepancy (the maximum mean discrepancy) has been popularised as a general tool for quantifying a probability distribution and constructing space filling designs, and determinantal point processes have emerged as a very promising tool for generating random sets of points with appropriate repulsion properties.

The workshop aims to introduce these modern topics to the MASCOT-NUM community through a series of presentations by key researchers in the field.

Organizers: Luc Pronzato, Julien Bect, Anthony Nouy.

Registration

Registration is free but appreciated.

Agenda

9:30–9:40: Luc Pronzato (introduction)
9:40–10:30: Pierre Olivier Amblard
coffee break (30')
11:00–11:35: Pierre Olivier Amblard
11:35–12:30: Jean-François Coeurjolly
lunch break
14:30–15:25: Rémi Bardenet
15:25–16:20: Onur Teymur
coffee break (15')
16:35–17:30: Marina Riabiz

Speakers

Luc Pronzato (CNRS & Univ. Côte d'Azur, France)
♣ Introduction

Pierre Olivier Amblard (CNRS & Univ. Grenoble Alpes, France)
♣ An introduction to determinantal point processes
♣ GP regression in the flat limit

Jean-François Coeurjolly (Univ. Grenoble Alpes, France)
♣ Repulsiveness for integration (not my social program)

Rémi Bardenet (CNRS & Univ. Lille, France)
♣ Interpolation and experimental design with volume sampling

Onur Teymur (Univ. of Kent & the Alan Turing Institute, UK)
♣ Optimal quantisation of probability measures using maximum mean discrepancy

Marina Riabiz (King's College London & the Alan Turing Institute, UK)
♣ Kernel Stein discrepancy minimization for MCMC thinning, with application to cardiac electrophysiology

Abstracts

P.-O. Amblard — An introduction to Determinantal Point Processes

Determinantal Point Processes (DPPs) are ubiquitous: They appear in many branches of Physics and Mathematics: models for fermions, random matrices, random graphs, random polynomials, random Gaussian processes, ... But they are also studied and used in machine learning as a prime example of a negatively correlated point process. Negative correlation produces diversity, an important notion in sampling examples out of a set.

In this introduction we will present the basic theory for determinantal processes restricting the ideas to the discrete setting. We will also give some information concerning the sampling of DPP's and will present an application of discrete DPPs to coresets (provable highly informative subsamples).

(with many interactions with (and inputs from) Simon Barthelmé and Nicolas Tremblay)

[slides (3.5 MB)]

P.-O. Amblard — Gaussian process regression in the flat limit

We study Gaussian Process regression, a pilar in Bayesian machine learning, also well-known as Kriging, in a limit which may at first glance appear as naive: We let the scale parameter of the kernel going to infinity, whence the name flat limit.

Surfing on the pionneering work of Driscol&Fornberg (2002), recently refined by Barthelmé&Usevich (2019) for general kernels, we uncover intriguing behavior of GP regression in the flat limit. Depending on the roughness of the kernel and on an amplitude rescaling, GP regression in the flat limit may lead to a penalized polynomial regression or to polyharmonic spline regression.

This short presentation will deal the very core of the results (spectral behaviour of the kernel in the limit), and will illustrate the limits in 1d and 2d.

(joint work with Simon Barthelmé, Nico Tremblay and Kostya Usevich)

[slides (2.4 MB)]

J.-F. Coeurjolly — Repulsiveness for integration (not my social program)

Integral estimation in any dimension is an extensive topic, largely treated in the literature, with a broad range of applications. Monte-Carlo type methods arise naturally when one looks forward to quantifying/controlling the error. Many methods have already been developped: MCMC, Poisson disk sampling, QMC (and randomized versions), Bayesian quadrature, etc. In this talk, I’ll consider a different approach which consists in defining the quadrature nodes as the realization of a spatial point process. In particular I’ll show that a very specific class of determinantal point processes, a class of repulsive point patterns, has excellent properties and is able to estimate efficiently integrals for non-differentiable functions with an explicit and faster rate of convergence than current methods.

(based on joint works with A. Mazoyer and P.-O. Amblard)

[slides (10 MB)]

R. Bardenet — Interpolation and experimental design with volume sampling

Volume sampling (VS) is a distribution over sets of design nodes in a generic space. VS is defined using a kernel and a reference measure, and samples exhibit a trade-off between repulsiveness, governed by the kernel, and marginal relevance, governed by the reference measure. I will first show that VS yields optimal interpolation points in reproducing kernel Hilbert spaces. Then, I will investigate VS for classical and Bayesian experimental design in generic design spaces, extending recent work on the theoretical properties of VS by Nikolov, Derezinski, et al. I will also connect VS to determinantal point processes, a related distribution over configurations of points.

(joint work with Ayoub Belhadji, Pierre Chainais, and Arnaud Poinas)

[slides (3.2 MB)]

O. Teymur — Optimal quantisation of probability measures using MMD

Several researchers have proposed minimisation of maximum mean discrepancy (MMD) as a method to quantise probability measures, i.e., to approximate a probability distribution by a representative point set. After giving a brief introduction to MMD, we will consider sequential algorithms that greedily minimise MMD over a discrete candidate set, thus performing a sort of sample ‘thinning’. This generalises the approach of Riabiz et al. (2020), who minimise kernel Stein discrepancy for the same task.

After this, we will look at non-myopic approaches for the same task i.e. algorithms that perform discrete (integer quadratic) optimisation to select more than one point simultaneously at each step and, in order to both improve statistical efficiency and reduce computational cost, we investigate a variant that applies this technique to mini-batches of the candidate set. We will look at some examples of the use of these algorithms on a range of important computational problems, including optimisation of nodes in Bayesian cubature and the thinning of Markov chain output, and discuss how they might be implemented in practice. If there is time, we will also consider a method of reducing the cost of the iteration-by-iteration optimisations that employs semi-definite relaxation for integer quadratic programmes. This approach has promise but is also challenging to implement; we will discuss the reasons why.

(joint work with Jackson Gorham, Marina Riabiz and Chris Oates)

[slides (1.8 MB)]

M. Riabiz — Kernel Stein discrepancy minimization for MCMC thinning...

...with application to cardiac electrophysiology

Calcium is the end-point intracellular signal driving cardiac myocyte contraction, and its dynamics is described through a set of coupled ordinary differential equations (ODEs). Markov Chain Monte Carlo (MCMC) can be used to characterize the posterior distribution of the parameters of the cardiac ODEs, which can then serve as an experimental design for multi-scale models of the whole hearth. However, in this setting MCMC suffers from poor mixing, so that post-processing of the MCMC output is required. Existing heuristics to assess the convergence and compress the MCMC output can produce sub-optimal empirical approximations, that suffer from bias-variance trade-off if the length of the MCMC output is fixed. In this talk, I will present a novel method that retrospectively selects a subset of states, of fixed cardinality, from the sample path, such that the approximation provided by their empirical distribution is close to optimal. This is based on greedy minimisation of a kernel Stein discrepancy, and it is suitable when the gradient of the log-target can be evaluated and an approximation using a small number of states is required. Theoretical results guarantee consistency of the method and I will demonstrate its effectiveness in the cardiac electrophysiology problem at hand.

Software is available at stein-thinning.org.

(joint work with Wilson Ye Chen, Jon Cockayne, Steven Niederer, Lester Mackey, Chris Oates)

[slides (w/o video, 4.3 MB)]

[slides (w/ video, 27.9 MB)]