Contributions to the theoretical analysis of statistical learning and uncertainty quantification methods


Modern data analysis provides scientists with statistical and machine learning algorithms with impressive performance. In front of their extensive use to tackle problems of constantly growing complexity, there is a real need to understand the conditions under which algorithms are successful or bound to fail. An additional objective is to gain insights into the design of new algorithmic methods able to tackle more innovative and challenging tasks. A natural framework for developing a mathematical theory of these methods is nonparametric inference. This area of Statistics is concerned with inferences of unknown quantities of interest under minimal assumptions, less restrictive than classical (parametric) statistics. The common thread is infinite-dimensional statistical modeling of a parameter on the data-generating mechanism. This flexibility is all the more appealing as we seek reliable algorithms under a wide range of settings, and the progress of data acquisition techniques results in massive and complex datasets. This last point prompts us to conduct an asymptotic analysis, which is a traditional approach to assessing the performance of learning procedures. In this thesis, we consider both problems of function estimation and uncertainty quantification.

The first class of algorithms we deal with are Bayesian tree-based methods. They are based on a ‘divide-and-conquer’ principle, partitioning a sample space to estimate the parameter locally. In regression, these methods include BCART and the renowned BART, the later being an ensemble of trees or a forest. In density estimation, the famous Pólya Tree prior exemplifies these methods and is the building block of a myriad of related constructions. We propose a new extension, DPA, that is a ‘forest of PTs’ and is shown to attain minimax contraction rates adaptively in Hellinger distance for arbitrary Hölder regularities. Adaptive rates in the stronger supremum norm are also obtained for the flexible Optional Pólya Tree (OPT) prior, a BCART-type prior, for regularities smaller than one. Gaussian processes are another popular class of priors studied in Bayesian nonparametrics and Machine Learning. Motivated by the ever-growing size of datasets, we propose a new horseshoe Gaussian process with the aim to adapt to leverage a data structure of smaller dimension. First, we derive minimax optimal contraction rates for its tempered posterior. Secondly, deep Gaussian processes are Bayesian counterparts to the famous deep neural networks. We prove that, as a building block in such a deep framework, it also gives optimal adaptive rates under compositional structure assumptions on the parameter. As for uncertainty quantification (UQ), Bayesian methods are often praised for the principled solution they offer with the definition of credible sets. We prove that OPT credible sets are confidence sets with good coverage and size (in supremum norm) under qualitative self-similarity conditions. Moreover, we conduct a theoretical study of UQ in Wasserstein distances Wp, uncovering a new phenomenon. In dimensions smaller than $4$, it is possible to construct confidence sets whose $W_p$-radii, $p\leq2$, adapt to any regularities (with no qualitative assumptions). This starkly contrasts the usual Lp theory, where concessions always have to be made.

Sep 28, 2022 14:00 — 18:00
PhD Defence
Sorbonne Université (Campus Pierre et Marie Curie), room 16-26-209 (Paul Lévy)
4, place Jussieu
75005 Paris