Chiara Amorino: Minimax rate for multivariate data under componentwise local differential
privacy constraints

Abstract:
In this talk, we analyse the balance between maintaining privacy and preserving
statistical accuracy when dealing with multivariate data that is subject to componentwise local
differential privacy (CLDP). With CLDP, each component of the private data is made public
through a separate privacy channel. This allows for varying levels of privacy protection for
different components or for the privatization of each component by different entities, each with
their own distinct privacy policies. We develop general techniques for establishing minimax
bounds that shed light on the statistical cost of privacy in this context, as a function of the
privacy levels $\alpha_1$, ... , $\alpha_d$ of the d components.
We demonstrate the versatility and efficiency of these techniques by presenting various
statistical applications. Specifically, we examine nonparametric density and covariance
estimation under CLDP, providing upper and lower bounds that match up to constant factors,
as well as an associated data-driven adaptive procedure. Furthermore, we quantify the
probability of extracting sensitive information from one component by exploiting the fact that,
on another component which may be correlated with the first, a smaller degree of privacy
protection is guaranteed.

This is based on a joint work with A. Gloter.

This is based on a joint work with A. Gloter.

Zhigang Bao: Phase Transition of Eigenvectors for Spiked Random Matrices

Abstract:
ln this talk, we will first provide an overview of recent findings concerning eigenvectors
of random matrices under fixed-rank deformation. We will then shift our focus towards analyzing
the li mit distribution of the leading eigenvectors of deformed models in the critical regime of the
Baik-Ben Arous-Peche (BBP) phase transition. The distribution is determined by a determinantal
point process with an extended Airy kernel. This result can be seen as an eigenvector counterpart
to the BBP eigenvalue phase transition.

The talk will be based on a joint work with Dong Wang.

The talk will be based on a joint work with Dong Wang.

Yannick Baraud: A new look at Bayesian Statistics

Abstract:
We address the problem of estimating the distribution of presumed i.i.d. observations
within the framework of Bayesian statistics. To do this, we consider a statistical model for the
distribution of the data as well as a prior on it and we propose a new posterior distribution that
shares some similarities with the classical Bayesian one. ln particular, when the statistical model
is exact, we show that this new posterior distribution concentrates its mass around the target
distribution, just as the classical Bayesian posterior would do under appropriate assumptions.
Nevertheless, we establish that this concentration property holds under weaker assumptions
than those generally required for the classical Bayesian posterior. Specifically, we do not require
that the prior distribution allocates sufficient mass on Kullback-Leibler neighbourhoods but only
on the larger Hellinger ones. More importantly, unlike the classical Bayesian distribution, ours
proves to be robust against a potential misspecification of the prior and the assumptions we
started from. We prove that the concentration properties we establish remain stable when the
equidistribution assumption is violated or when the data are i.i.d. with a distribution that does
not belong to the model but only lies close enough to it. The results we obtain are nonasymptotic
and involve explicit numerical constants.

Denis Belomestny: Provable Benefits of Policy Learning from Human Preferences

Abstract:
A crucial task in reinforcement learning (RL) is a reward construction. It is common in practice that no obvious choice of reward function exists.
Thus, a popular approach is to introduce human feedback during training and leverage such feedback to learn a reward function.
Among all policy learning methods that use human feedback, preference-based methods have demonstrated substantial success in recent empirical
applications such as InstructGPT. In this work, we develop a theory
that provably shows the benefits of preference-based methods in tabular and linear MDPs.
The main idea of our method is to use KL-regularization with respect to the learned policy to ensure more stable learning.

Benoît Collins: On the norm of random matrices with a tensor structure

Abstract:
Random matrices with tensor structures are important in many areas, including
operator algebras, artificial intelligence, graph theory, etc. An important problem is to establish
limit theorems for the operator norm of models obtained from algebraic operations involving
multiple copies of such random tensors. This talk will describe more precisely relevant questions
and recent progress and applications.

It is primarily based on collaborations with Charles Bordenave.

It is primarily based on collaborations with Charles Bordenave.

Arnak Dalalyan: Langevin Monte Carlo: Randomized mid-point method revisited

Abstract:
Langevin Monte Carlo is an efficient and widely used method for generating random
samples from a given target distribution in a high-dimensional Euclidean space. Various variants
of the Langevin Monte Carlo method have been proposed and discussed in the literature;
depending on the properties of the target distribution, some variants may be preferred ta others.
Among these variants, it has been shown that the Randomized Mid point Langevin Monte Carlo
(RMP-LCM) method has the best known non-asymptotic theoretical guarantees on the sampling
error, when the log-density of the target distribution has a continuous Lipschitz gradient. The
objective of this talk is ta review these results, as well as ta present some extensions and
improvements.

Charlotte Dion-Blanc: Multiclass classification for Hawkes process

Abstract:
We investigate the multiclass classification problem where the features are event
sequences. More precisely, the data are assumed to be generated by a mixture of simple linear
Hawkes processes. ln this new setting, the classes are discriminated by various triggering
kernels. A challenge is then to build an efficient classification procedure. We derive the optimal
Bayes rule and provide a two-step estimation procedure of the Bayes classifier. ln the first step,
the weights of the mixture are estimated; in the second step, an empirical risk minimization
procedure is performed to estimate the parameters of the Hawkes processes. We establish the
consistency of the resulting procedure and derive rates of convergence. Then, we tackle the
case of multivariate Hawkes processes. The challenge here is the highdimension of the
classification problem which can be solved using a LASSO-type step in the procedure.

Joint work with Christophe Denis and Laure Sansonnet.

Joint work with Christophe Denis and Laure Sansonnet.

Gonçalo Dos Reis: New results on the simulation of mean field equations: super-measure
growth and the non-Markovian Euler schemes

Abstract:
ln this talk, we review the state of the art and caver recent developments in the
simulation of mean-field equations of McKean Vlasov type. On one part we discuss mean-field
diffusions under super-linear growth assumptions on the equation's coefficients. This class of
equations appear ubiquitously in interacting-particle system modelling. We discuss the
phenomena of particle corruption in the simulations and illustrate our findings with a range of
examples.
ln the second part of the talk, we discuss a recent method dubbed 'non-Markovian Euler scheme'
that, although an Euler type scheme of standard weak error rate of 1, is able to attain a weak
convergence rate of 2 for the invariant distribution.

Clement Hardy: Prediction and testing of mixtures of features issued from a continuous
dictionary

Abstract:
ln this talk, we will consider observations that are random elements of a Hilbert space
resulting from the sum of a deterministic signal and a noise. The signais considered will be linear
combinations (or mixtures) of a finite number of features issued from continuous parametric
dictionaries.

ln order to estimate the linear coefficients as well as the non-linear parameters of a mixture in the presence of noise, we propose estimators that are solutions to an optimization problem. We shall quantify the performance of these estimators with respect to the quality of the observations by establishing prediction and estimation bounds that stand with high probability. ln practice, it is common to have a set of observations (possibly a continuum) sharing common features. The question arises whether the estimation of signais can be improved by taking advantage of their common structure. We give a framework in which this improvement occurs.

Next, we shall test whether a noisy observation is derived from a given signal and give nonasymptotic upper bounds for the associated testing risk. ln particular, our test encompasses the signal detection framework. We will derive an upper bound for the strength that a signal must have in order to be detected in the presence of noise.

This presentation is based on joint work with C.Butucea, J-F. Delmas and A. Dutfoy.

ln order to estimate the linear coefficients as well as the non-linear parameters of a mixture in the presence of noise, we propose estimators that are solutions to an optimization problem. We shall quantify the performance of these estimators with respect to the quality of the observations by establishing prediction and estimation bounds that stand with high probability. ln practice, it is common to have a set of observations (possibly a continuum) sharing common features. The question arises whether the estimation of signais can be improved by taking advantage of their common structure. We give a framework in which this improvement occurs.

Next, we shall test whether a noisy observation is derived from a given signal and give nonasymptotic upper bounds for the associated testing risk. ln particular, our test encompasses the signal detection framework. We will derive an upper bound for the strength that a signal must have in order to be detected in the presence of noise.

This presentation is based on joint work with C.Butucea, J-F. Delmas and A. Dutfoy.

Johannes Moritz Jirak: Weak dependence and optimal quantitative self-normalized central limit
theorems

Abstract:
Motivated from highdimensional problems, we revisit estimation of the long-run
variance, subject to dependence structures. More precisely, consider a stationary, weakly
dependent sequence of random variables. Given that a CLT holds, how should we estimate the
long-run variance? This problem has been studied for decades, prominent proposed solutions
were given for instance by Andrews {1991) or Newey and West {1994). Using the proximity of
the corresponding normal distribution as quality measure, we discuss optimal solutions and why
previous proposais are not optimal in this context.

The setup contains many prominent dynamical systems and time series models, including random walks on the general linear group, products of positive random matrices, functionals of Garch models of any order, functionals of dynamical systems arising from SDEs, iterated random functions and many more.

The setup contains many prominent dynamical systems and time series models, including random walks on the general linear group, products of positive random matrices, functionals of Garch models of any order, functionals of dynamical systems arising from SDEs, iterated random functions and many more.

Christophe Ley: Advances in statistics via tools from Stein's Method

Abstract:
Stein's Method is becoming increasingly popular in statistics and machine learning. ln
this talk, I will describe how various components from the famous Stein Method, a well-known
approach in probability theory for approximation problems, have been recently put to successful
use in theoretical and computational statistics.

Yingying Li: Estimating Efficient Frontier with Ali Risky Assets

Abstract:
We propose a method ta estimate the efficient frontier with all risky assets under a
high-dimensional setting. The method utilizes linear constrained LASSO based on an equivalent
constrained regression representation of the mean-variance optimization. Under a mild sparsity
assumption, we show that our estimator asymptotically achieves mean-variance efficiency.
Extensive simulation and empirical studies are conducted ta examine the performance of our
proposed estimator.

Based on joint work with Leheng Chen and Xinghua Zheng.

Based on joint work with Leheng Chen and Xinghua Zheng.

Zeng Li: Robust estimation of number of factors in high dimensional factor modeling
via Spearman's rank correlation matrix

Abstract:
Determining the number of factors in high-dimensional factor modeling is essential but
challenging, especially when the data are heavy-tailed. ln this paper, we introduce a new
estimator based on the spectral properties of Spearman's rank correlation matrix under the highdimensional
setting, where bath dimension and sample size tend to infinity proportionally. Our
estimator is applicable for scenarios where either the common factors or idiosyncratic errors
follow heavy-tailed distributions. We prove that the proposed estimator is consistent un der mild
conditions. Numerical experiments also demonstrate the superiority of our estimator compared
to existing methods, especially for the heavy-tailed case.

Dmytro Marushkevyc: Parametric statistical inference for high-dimensional diffusions

Abstract:
This talk is dedicated to the problem of parametric estimation in the diffusion setting
and mostly concentrated on properties of the Lasso estimator of drift component. More
specifically, we consider a multivariate parametric diffusion model X observed continuously over
the interval [0,T] and investigate drift estimation under sparsity constraints. We allow the
dimensions of the model and the parameter space to be large. We obtain an oracle inequality for
the Lasso estimator and derive an error bound for the L2-distance using concentration
inequalities for linear functionals of diffusion processes. The probabilistic part is based upon
elements of empirical processes theory and, in particular, on the chaining method. Sorne
alternative estimation procedures, such as adaptive and relaxed Lasso will also be discussed to
give a perspective on improving the obtained results.

Felix Parraud: Free stochastic calculus and Random Matrix Theory

Abstract:
Recently we developed a method to compute asymptotic expansions of certain
quantities coming from Random Matrix Theory. One can then use those results to study the
spectral properties of polynomials in random matrices. This method relies notably on free
stochastic calculus. In this talk I shall introduce basic notions of this theory and show how it
naturally appears when studying random matrix stochastic processes in large dimension. I will
then expiain how to apply that theory to Random Matrix Theory.

Giovanni Peccati: Quantitative CLTs in deep neural networks and coupling of Gaussian fields

Abstract:
Fully connected random neural networks are fascinating examples of random fields,
obtained by hierarchically juxtaposing layers of computational units - sometimes referred to as
neurons. Since the pioneering work of Neal (1996), it is known that neural networks exhibit
Gaussian behavior in the so-called "large-width limit", that is when the sizes of the layers
simultaneously diverge to infinity. One crucial question - which has been relatively little explored
in the literature - is how to measure the distance between the distribution of a fixed neural
network and its Gaussian counterpart. ln this talk, I will explain how one can obtain probabilistic
bounds on such discrepancy - featuring an algebraic dependence on the network's width - by
exploiting (a) Stein's method (in the case of finite-dimensional approximations), and (b) some
estimates on the optimal coupling of Gaussian fields (in the case of functional approximations).

Based on joint work with S. Favaro, B. Hanin, D. Marinucci, and I. Nourdin.

Based on joint work with S. Favaro, B. Hanin, D. Marinucci, and I. Nourdin.

Vincent Rivoirard: Bayesian nonparametric inference for nonlinear Hawkes processes

Abstract:
Hawkes processes are a specific class of point processes modeling the probability of
occurrences of an event depending on past occurrences. Hawkes processes are therefore
naturally used when one is interested in graphs for which the temporal dimension is essential. ln
the linear framework, the statistical inference of Hawkes processes is now well known. We will
therefore focus more specifically on the class of nonlinear multivariate Hawkes processes that
allow to model bath excitation and inhibition phenomena between nodes of a graph. We will
present the Bayesian nonparametric estimation of the parameters of the Hawkes model and the
posterior contraction rates obtained on Holder classes. From the practical point of view, since
simulating posterior distributions is often out of reach in reasonable time, especially in the
mutlivariate framework, we will more specifically use the variational Bayesian approach which
provides a direct and fast computation of an approximation of the posterior distributions
allowing the analysis in reasonable time of graphs containing several tens of neurons.

Joint work with Déborah Sulem and Judith Rousseau.

Joint work with Déborah Sulem and Judith Rousseau.

Judith Rousseau: Semi-parametric inference : A Bayesian curse?

Abstract:
ln this talk I will discuss some issues around Bayesian approaches in semiparametric
inference. I will first recall some positive and negative results on Bernstein von Mises theorems
in non and semi-parametric models. I will then propose two possible tricks to derive posteriortype
distributions in semiparametric models which allow both for efficient procedures and
Bernstein von Mises theorems, as well as flexible priors on the nonparametric part. The first
approach, based on the eut posterior will be illustrated in semi-parametric mixture and Hidden
Markov models and second, a targeted posterior, will be applied in the well known causal
inference problem of average treatment effect estimation.

This talk is build on joint works with Edwin Fong, Chris Holmes, Dan Moss and Andrew Viu.

This talk is build on joint works with Edwin Fong, Chris Holmes, Dan Moss and Andrew Viu.

Claudia Strauch: Change point estimation for a stochastic heat equation

Abstract:
We study a change point model based on a stochastic partial differential equation (SPDE) corresponding to the heat equation governed by the weighted
Laplacian $\Delta_\vartheta = \nabla\vartheta\nabla$, where $\theta=\theta(x)$ is a space-dependent diffusivity, on the domain (0,1) with Dirichlet
boundary conditions. Based on local measurements of the solution in space with resolution $\delta$ over a finite time horizon, we develop a simultaneous
M-estimator for the diffusivity parameters $\theta_\pm$ and the change point $\tau$ characterizing the piecewise constant diffusivity $\vartheta$.
We work in the general setting where the parameters $\theta_\pm$ are allowed to vary with the resolution $\delta$. The change point estimator converges
at rate $\delta$, while the diffusivity constants can be recovered with convergence rate $\delta^{3/2}$. Moreover, when the diffusivity parameters are
known and the jump height vanishes with the spatial resolution tending to zero, we derive a limit theorem for the change point estimator and identify
the limiting distribution as one familiar from the change point literature. For the mathematical analysis, a precise understanding of the SPDE with
discontinuous $\theta$, tight concentration bounds for quadratic functionals in the solution, and a generalisation of classical M-estimators are developed.

Based on joint work with Markus Reiss and Lukas Trottner.

Based on joint work with Markus Reiss and Lukas Trottner.

Martin Wahl: A kernel-based analysis of Laplacian eigenmaps

Abstract:
Laplacian eigenmaps and diffusion maps are nonlinear dimensionality reduction
methods that use the eigenvalues and eigenvectors of normalized graph Laplacians. From a
mathematical perspective, the main problem is to understand these empirical Laplacians as
spectral approximations of the underlying Laplace-Beltrami operator. ln this talk, we study
Laplacian eigenmaps through the lens of kernel PCA. This leads to novel points of view and allows
to leverage results for empirical covariance operators in infinite dimensions.

Qinwen Wang: Asymptotics of robust estimators of scatter in high-dimension

Abstract:
ln this talk, we will investigate the limiting spectral properties of two robust estimators
of scatter: the sample spatial-sign covariance matrix and Tyler's M estimator, in high dimensional
scenarios. The populations under study are general to include the independent components
model and the family of elliptical distributions. These may corne with known or unknown location
vectors. Both the empirical spectral distributions and the central limit theorems for a class of
linear spectral statistics of the two matrix ensembles are studied.

Jeff Jianfeng Yao: Limiting distributions for eigenvalues of sample correlation matrices from heavy-tailed populations

Abstract:
Consider a $p$-dimensional population $x\in \mathbb{R}^p$ with iid coordinates that are regularly varying with index $\alpha\in (0,2)$.
Since the variance of $x$ is infinite, the diagonal elements of the sample covariance matrix $S_n=n^{-1}\sum_{i=1}^n {x_i}x'_i$ based on a sample $x_1,\ldots, x_n$ from the population tend to infinity as $n$ increases and it is of interest to use instead the sample correlation matrix $R_n= \{\mathrm{diag}(S_n)\}^{-1/2}\, S_n\{\mathrm{diag}(S_n)\}^{-1/2}$.
This paper finds the limiting distributions of the eigenvalues of $R_n$ when both the dimension $p$ and the sample size $n$ grow to infinity such that $p/n\to \gamma \in (0,\infty)$.
The family of limiting distributions $\{H_{\alpha,\gamma}\}$ is new and depends on the two parameters $\alpha$ and $\gamma$.
The moments of $H_{\alpha,\gamma}$ are fully identified as sum of two contributions: the first from the classical Mar\v{c}enko-Pastur law and a second due to heavy tails.
Moreover, the family $\{H_{\alpha,\gamma}\}$ has continuous extensions at the boundaries $\alpha=2$ and $\alpha=0$ leading to the Mar\v{c}enko-Pastur law and a modified Poisson distribution, respectively.
Our proofs use the method of moments, a path-shortening algorithm and some novel graph counting combinatorics.

This is a joint work with Johannes Heiny (Stockholm University).

This is a joint work with Johannes Heiny (Stockholm University).

Wangjun Yuan: On spectrum of sample covariance matrices from large tensor vectors

Abstract:
ln this talk, we study the limiting spectral distribution of sums of independent rank-one
large k-fold tensor products of large n-dimensional vectors. In the literature, the limiting moment
sequence is obtained for the case k = o(n) and k = O(n). Under appropriate moment conditions
on base vectors, it has been showed that the eigenvalue empirical distribution converges to the
celebrated Marcenko-Pastur law if k = O(n) and the components of base vectors have unit
modulus, or k = o(n). ln this talk, we study the limiting spectral distribution by allowing k to grow
much faster, whenever the components of base vectors are complex random variables on the
unit circle. lt turns out that the limiting spectral distribution is Marcenko-Pastur law. Comparing
with the existing results, our limiting setting only requires $k \to \infty$. Our approach is based on the
moment method.

Nikita Zhivotovskiy: Mean and Covariance Matrix Estimation for Anisotropie Distributions in
the Presence of Outliers

Abstract:
Suppose we are observing a sample of independent random vectors, knowing that the
original distribution was contaminated, so that a fraction of observations came from a different
distribution. How to estimate the mean and the covariance matrix of the original distribution in
this case? In this talk, we discuss some recent estimators that achieve the optimal nonasymptotic,
dimension-free rate of convergence under the model where the adversary can
corrupt a fraction of the samples arbitrarily. The discussion will cover a range of distributions
including specifically Gaussian, sub-Gaussian, and heavy-tailed distributions.