#
Luxembourg-Waseda Conference

on Modelling and Inference for Complex Data

### Fully Distribution-Free Center-Outward Rank Tests for Multiple-Output Regression and MANOVA

*by Marc Hallin (Université libre de Bruxelles)*

Extending rank-based inference to a multivariate setting such as multiple-output regression or MANOVA with unspecified \(d\)-dimensional error density has remained an open problem for more than half a century.
None of the many solutions proposed so far is enjoying the combination of distribution-freeness and efficiency that makes rank-based inference a successful tool in the univariate setting.
A concept of *center-outward* multivariate ranks and signs based on measure transportation ideas has been introduced recently.
Center-outward ranks and signs are not only distribution-free but achieve in dimension \(d>1\) the (essential) maximal ancillarity property of traditional univariate ranks, hence carry all the "distribution-free information" available in the sample.
We derive here the Hájek representation and asymptotic normality results required in the construction of center-outward rank tests for multiple-output regression and MANOVA.
When based on appropriate spherical scores, these fully distribution-free tests achieve parametric efficiency in the corresponding models.

Presentation slides :

Video recording:

### Joint circular distributions in view of higher order spectra of time series and copula

*by Masanobu Taniguchi (Waseda University) winner of the 2022 Prize of the Minister of Education & Science from Japan*

Circular data analysis is emerging as an important component of statistics.
For this half century, various circular distributions have been proposed, e.g., von Mises distribution, wrapped Cauchy distribution, among other things.
Also, regarding the joint distribution, Wehrly and Johnson (1980) proposed a bivariate circular distribution which is related to a family of Markov processes on the circle.
Because the sample space is on a circle, various new statistical methods have been developed.
In this talk we provide a new look at circular distributions in view of spectral distributions of time series because the typical circular distributions correspond to spectral densities of time series models.
For example, autoregressive $AR(1)$ spectral density corresponds to wrapped Cauchy distribution, and von Mises distribution corresponds to exponential spectral density (Bloomfield (1973)), etc.
Furthermore we introduce a class of joint circular distributions from the higher order spectra of time series, which can describe very general joint circular distributions.
Hence we can develop the statistical inference for dependent observations on the circle.
We present a family of distributions on the circle derived from the ARMA spectral density.
It is seen that the proposed family includes some existing circular families as special cases.
For these special cases, the normalizing constant and trigonometric moments are shown to have simple and closed form.
We develop the asymptotic optimal inference theory based on the local asymptotic normality (LAN) on the circle.
Because the observations are permitted to be dependent, the theory opens a new paradigm in the estimation for joint circular distributions.
Because we introduced very general joint circular distributions, we can discuss the problem of copula for them.

*This is a joint work with Shogo Kato (Institute of Statistical Mathematics,Tokyo), Hiroaki Ogata (Tokyo Metropolitan University) and Arthur Pewsey (University of Extremadura, Spain)*

Presentation slides :

Video recording:

### Estimation of mixed fractional stable processes using high-frequency data

*by Mark Podolskij (University of Luxembourg)*

The linear fractional stable motion generalizes two prominent classes of stochastic processes, namely stable Levy processes, and fractional Brownian motion.
For this reason it may be regarded as a basic building block for continuous time models.
We study a stylized model consisting of a superposition of independent linear fractional stable motions and our focus is on parameter estimation of the model.
Applying an estimating equations approach, we construct estimators for the whole set of parameters and derive their asymptotic normality in a high-frequency regime.
The conditions for consistency turn out to be sharp for two prominent special cases:
(i) for Levy processes, i.e. for the estimation of the successive Blumenthal-Getoor indices, and
(ii) for the mixed fractional Brownian motion introduced by Cheridito.
In the remaining cases, our results reveal a delicate interplay between the Hurst parameters and the indices of stability.
Our asymptotic theory is based on new limit theorems for multiscale moving average processes.

Presentation slides :

Video recording:

### Parameter estimation of discretely observed interacting particle systems

*by Chiara Amorino (University of Luxembourg)*

We consider the problem of joint parameter estimation for drift and volatility coefficients of a stochastic McKean-Vlasov equation and for the associated system of interacting particles.
The analysis is provided in a general framework, as both coefficients depend on the solution of the process and on the law of the solution itself.
Starting from discrete observations of the interacting particle system over a fixed interval $[0, T]$, we propose a contrast function based on a pseudo likelihood approach.
We show that the associated estimator is consistent when the discretization step ($\Delta_n$) goes to 0 and the number of particles N goes to $\infty$, and asymptotically normal when additionally the condition $\Delta_n N \rightarrow 0$ holds.
We will also compare our results (and our condition on the decay of the discretization step) with the results known for classical SDEs.
The talk is based on a joint work with A. Heidari, V. Pilipauskaite and M. Podolskij.

Presentation slides :

Video recording:

### A model for higher-order circular Markov process

*by Hiroaki Ogata (Tokyo Metropolitan University)*

The strictly stationary higher-order Markov process for circular data is considered.
We employ the mixture transition distribution model to express the transition density of the process.
The underlying circular transition distribution is based on Wehrly and Johnson's bivariate circular models.
The structure of the circular autocorrelation function is found to be similar to the autocorrelation function of the autoregressive process on the line.
The circular partial autocorrelation and spectral density are also provided.
The validity of the model is assessed by applying it to a series of real directional data.
Joint work with Takayuki Shiohama (Nanzan University).

Video recording:

### Integrated likelihood based inference for dynamic binary choice panel data models with fixed effects

*by Gautam Tripathi (Economics, University of Luxembourg)*

We use an integrated likelihood approach to estimate the parameters and marginal effects in an \(AR(1)\) binary choice panel data model with fixed effects.
Additional covariates in the model are treated as being predetermined each period, which allows for feedback from the current outcome to the future covariates.

Presentation slides :

Video recording:

### An extended sine-skewed circular distribution and its application to a model on a cylinder

*by Yoichi Miyata (Takasaki City University of Economics)*

The sine-skewed circular distributions are tractable circular probability models that can be asymmetric in shape and that have the advantages that the sine and cosine moments can be written in explicit forms.
We use the framework proposed by Ley and Verdebout (2017) and Umbach and Jammalamadaka (2009) to propose a new family of circular probability distributions as an extension of the sine-skewed circular distribution.
This family includes distributions that can give stronger asymmetry around the mode than the sine-skewed circular distributions.
Furthermore, we show that a subfamily of the extended sine-skewed wrapped Cauchy distributions and that of the extended sine-skewed von Mises distributions are identifiable with respect to parameters and that the sine and cosine moments of proposed circular distributions are written in explicit forms.
We also propose a probability model on a cylinder whose marginal distribution is the proposed distribution and show its usefulness through an example of real data analysis.

Presentation slides :

Video recording:

### Aggregation of network traffic and anisotropic scaling of random fields

*by Vytauté Pilipauskaite (Aalborg University)*

In this talk we extend results of Mikosch, Resnick, Rootzén, Stegeman (2002), Gaigalas, Kaj (2003) and other papers on approximations to cumulative network traffic over a time interval, to new and more general input processes $X = \{X(t), t \in \mathbb{R}\}$.
More specifically, we study sums of $\lfloor \lambda^\gamma y \rfloor$ independent copies of $X$ integrated over time interval $(0, \lambda x]$, as a random field $A_{\lambda,\gamma} = \{A_{\lambda,\gamma} (x,y), (x,y) \in \mathbb{R}^2_+\}$ when $\lambda$ tends to infinity, for a given $\gamma > 0$.
We have two classes of stationary inputs $X$: (I) Poisson shot-noise with (random) pulse process, and (II) regenerative process with random pulse process and regeneration times following a heavy-tailed renewal process.
In both cases (I) and (II) we find simple conditions on $X$ so that the limit distribution of centered and appropriately normalized $A_{\lambda,\gamma}$ is a stable Lévy sheet if $\gamma < \gamma_0$, and a fractional Brownian sheet with Hurst parameter $(H,1/2)$ if $\gamma > \gamma_0$, for some $\gamma_0 > 0$.
We also prove an `intermediate' limit for $\gamma = \gamma_0$.
Related transition of limit distribution also appears for other large classes of long-range dependent random fields on $\mathbb{Z}^2$ or $\mathbb{R}^2$ when their sums or integrals respectively are taken over anisotropically (non-uniformly) scaled rectangular sets.
The talk is based on joint work with Remigijus Leipus, Donatas Surgailis (Vilnius University, Lithuania).

Presentation slides :

Video recording:

### A simple EM algorithm for circular Cauchy type distributions

*by Toshihiro Abe (Hosei University)*

We consider a simple expression of the EM algorithm for Cauchy-type distributions on the circle.
As an alternative family of the sine-skewed circular distributions, we consider other skew-symmetric circular distributions and try to give a simple EM algorithm for them.
Furthermore, we give EM algorithms for finite mixture models of the distributions.
Finally, we give examples of the proposed EM algorithm.

Presentation slides :

Video recording:

### On partial sum processes of functions of ARMAX residuals

*by Benjamin Holcblat (Finance, University of Luxembourg)*

We establish general and versatile results regarding the limit behavior of the partial-sum process of ARMAX residuals.
Illustrations include ARMA with seasonal dummies, misspecified ARMAX models with autocorrelated errors, nonlinear ARMAX models, ARMA with a structural break, a wide range of ARMAX models with infinite-variance errors, weak GARCH models and the consistency of kernel estimation of the density of ARMAX errors.
Our results identify the limit distributions, and provide a general algorithm to obtain pivot statistics for CUSUM tests.
Some extensions to ARMA with polynomial trend and a unit root are also presented.

Presentation slides :

Video recording:

### A copula model for trivariate circular data

*by Shogo Kato (The Institute of Statistical Mathematics)*

We propose a new family of distributions for trivariate circular data.
Its density can be expressed in simple form without involving infinite sums or integrals.
The univariate marginals of the proposed distributions are the uniform distributions on the circle, and therefore the presented family is considered a copula for trivariate circular data.
The bivariate marginals of the proposed distributions are members of the family of Wehrly and Johnson (1980).
The univariate and bivariate conditional distributions are the wrapped Cauchy distributions and the distributions of Kato and Pewsey (2015), respectively.
An efficient algorithm is presented to generate random variates from our model.
A closed-form expression is available for trigonometric moments.
Maximum likelihood estimation for the presented distributions is considered.
An extension of the proposed family for multivariate circular data is briefly discussed.

This is joint work with Christophe Ley of the University of Luxembourg, Luxembourg.

References:

- Kato, S. and Pewsey, A. (2015). A Möbius transformation-induced distribution on the torus. Biometrika, 102(2), 359-370.
- Wehrly, T.E. and Johnson, R.A. (1980). Bivariate models for dependence of angular observations and a related Markov process. Biometrika, 67(1), 255-256.

### Breakthrough of directional statistics in space science

*by Guendalina Palmirotta (European Space Agency)*

It should be no surprise that already back in the 17-18th centuries important foundations of modern statistical theory were formulated to address astronomical problems, the astronomers were the statisticians.
For instance the 'almost coincidence' in the orbits of the planets in our Solar System with the ecliptic has intrigued the scientists for a long time.
Even D. Bernoulli (in the 1730's) wondered if this fact could happen 'by chance'.
In a statistical framework, one could think of using a uniformity test on the sphere.
Testing isotropy or, equivalently, testing uniformity on the unit hypersphere is one of the oldest as well as most fundamental problems in directional statistics and it is still much considered nowadays.

Furthermore with the increasing astronomical data, innovative modern directional statistical theories and models have been proposed to deal with space science issues such as tracking space objects.

In this talk, we will provide a review of the many old and recent developments of directional statistics animated by interesting applications in space science.
This is a joint work with Christophe Ley.

Presentation slides :

Video recording:

### GNSS-R at UL: overview and recent advances

*by Sajad Tabibi (Engineering, University of Luxembourg)*

Global navigation satellite system reflectometry (GNSS-R) has been used to exploit signals of opportunity at $L$-band for soil moisture, sea ice, and ocean wind remote sensing.
In one hand, the delay of reflections with respect to the line-of-sight propagation can be used to derive the vertical distance between the receiving platform and the surface level.
On the other hand, the distortion of the reflected signal can be used to characterize the reflecting surface such as surface roughness and dielectric properties.
This presentation will describe different GNSS-R payloads and present a retrieval algorithm for the sea level and soil moisture studies, as well as sea-ice under grazing angle geometries.
We present a method to mitigate uncertainties in the sea level studies using the inversion formal uncertainty and modulation-specific variance factors.
The RMSE between sub-hourly GNSS-R and TG is 1.98 cm, with 0.998 correlation coefficient.
The RMSE between grazing angle GNSS-R retrievals and the reference surface model is 16.4 cm using ionosphere-free total phase measurements.
Finally, we discuss that the GNSS-R mission is capable of sensing variations in soil moisture with a ubRMSD of 0.062 m3 m-3 compared to SMAP as the reference.

Presentation slides :

Video recording:

### Statistically Enhanced Learning

*by Florian Felice (University of Luxembourg)*

Feature engineering is of critical importance in the field of Machine Learning (ML).
While any ML practitioner knows the importance of rigorously preparing data to obtain good performing ML models, only scarce literature formalizes its benefits.
In this talk, we will present the method of Statistically Enhanced Learning (SEL), a generalization and formalization framework of existing feature engineering and extraction tasks in ML.
The difference with classical ML consists in the fact that certain predictors are not directly observed but obtained as statistical estimators.
Our goal is show the increased performance of SEL compared to classical ML by means of Monte Carlo simulations.
We will also present a practical application whose goal it is to create a prediction model of match results for the women's and men's handball tournaments at the Paris 2024 Olympic Games.
This is a joint work with Andreas Groll and my supervisor Christophe Ley.

Presentation slides :

Video recording:

### Instrumental variable method for nonlinear time series models

*by Tomoyuki Amano (The University of Electro-Communications)*

In linear time series model, it is assumed that explanatory variables and disturbance are independent generally.
But in economics, there are occasions that this condition is broken and the ordinary least squares estimator is generally biased.
Furthermore this bias does not decrease nevertheless the sample size is large.
To overcome this difficulty, P. G. Wright (1928) and S. Wright (1925) proposed instrumental variable method and Reiersol (1941, 1945) and Geary (1949) developed this method.
This method is widely written in White (2001).
Recently this method is applied to CAPM model in Amano, Kato, Taniguchi (2012).
Today in this talk we apply this method to nonlinear time series model and derive asymptotic property.

Presentation slides :

Video recording:

### Flexible models for complex data and where to find them

*by Christophe Ley (University of Luxembourg)*

Probability distributions are the building blocks of statistical modelling and inference.
It is therefore of utmost importance to know which distribution to use in what circumstances, as wrong choices will inevitably entail a biased analysis.
In this talk, we will focus on circumstances involving complex data and describe popular flexible models for these settings.
We will strive to answer the question: Which properties ought a good flexible model possess?

Presentation slides :

Video recording:

### Complex-valued time series models and their relations to directional statistics

*by Takayuki Shiohama (Nanzan University)*

The fluctuation of stationary time series often shows a certain periodic behavior and this pattern is usually summarized via a spectral density.
Since spectral density is a periodic function, it can be modeled by using a circular distribution function.
In this talk, several time series models are studied in relation to a circular distribution.
As an introduction, we illustrate how to model bivariate time series data using complex-valued time series in the context of circular distribution functions.
These models are extended to have a skewed spectrum by incorporating a sine-skewing transformation.
Two parameter estimation methods are considered and their asymptotic properties are investigated.
These theoretical results are verified via a Monte Carlo simulation.
Real data analyses illustrate the applicability of the proposed model.

Presentation slides :

Video recording:

### Variations on a theorem by de Jong

*by Giovanni Peccati (University of Luxembourg)*

In a classical contribution from 1990, P. de Jong established the surprising fact that an arbitrary sequence of normalized and degenerate U-statistics converges in distribution towards a Gaussian random variable if and only if its fourth cumulant converges to zero (and a certain Lindberg-type condition is satisfied). In this talk, I will describe some recent extensions of this result to multi-dimensional and functional settings. In particular, our findings yield functional versions of the "universality of Wiener chaos phenomenon" first detected in Nourdin, Peccati, and Reinert (2010). My talk is based on the following joint works:

- Ch. Döbler, M. Kasprzak and G. Peccati: Weak convergence of U-processes with size-dependent kernels. Ann. App. Prob., 2022
- Ch. Döbler, M. Kasprzak and G. Peccati. The multivariate functional de Jong CLT. Probab. Th. Rel. Fields, 2022+

Presentation slides :

Video recording:

### New results in finite mixture modeling

*by Jean Schiltz (Finance, University of Luxembourg)*

We introduce an extension of Nagin's finite mixture model to underlying Beta distributions and present our R package `trajeR`

which allows to calibrate the model.
Then, we test the model and illustrate some of the possibilities of `trajeR`

by means of an example with simulated data.
In a second part of the paper, we use this model to analyze COVID-19 related data during the first part of the pandemic.
We identify a classification of the world into five groups of countries with respect to the evolution of the contamination rate and show that the median population age is the main predictor of group membership.
We do however not see any sign of efficiency of the sanitary measures taken by the different countries against the propagation of the virus.

Presentation slides :

Video recording:

### Using replica exchange Hamiltonian Monte Carlo and thermodynamic integration for comparison of dynamic rainfall-Runoff models

*by Damian Mingo Ndiwago (Engineering, University of Luxembourg)*

Hydrologists often need to choose between competing hypotheses or weight the predictions of different models when averaging models.
Several criteria for choosing and weighting models have been developed, which balance model complexity and accuracy by penalising the number of model parameters.
The penalty is explicit for information theory approaches or implicit for Bayesian model selection based on marginal likelihood.
However, the marginal likelihood approximation is computationally intensive and slow for dynamic models with multiple modes.
This study uses replica exchange Hamiltonian Monte Carlo and thermodynamic integration for fast, simultaneous calculation of marginal likelihood and identification parameters of dynamic rainfall-runoff models.
Using synthetic data, the method always selected the true model in our numerical experiments.
The technique was also applied to real data from Magela Creek in Australia.
The selected model was not the model with the highest number of parameters for real data.
The method is implemented using the differentiable programming software "TensorFlow probability".
This implementation can easily be applied to other types of models for fast simultaneous parameter estimation and model comparison.

Presentation slides :

Video recording:

### The search of universal estimators in statistics

*by Yannick Baraud (University of Luxembourg)*

When i.i.d. data are observed and one seeks to estimate their density based on a parametric (or non-parametric) statistical model, statisticians often use the maximum likelihood estimator because it has (under suitable assumptions) many optimality properties.
In the regression setting (with Gaussian errors), the least squares estimator is commonly used.
However, all these well-known procedures suffer from a major weakness: these nice properties are based on the assumption that the statistical model contains the true distribution of our data.
In practice, this assumption is questionable.
At most, we can expect that the model gives a reasonable approximation of reality but does not describe it perfectly.
This means that the distribution of our data should be close to our statistical model but may not contain it.
One could nevertheless think that these estimators would still perform well when the model is approximate only, but this is not usually the case.
In fact, it is well known that the maximum likelihood estimator in density estimation or the least squares in regression become unstable when the model is misspecified.
Sometimes it only takes one or two outliers out of a million "good" data for these procedures to give poor results.
The question to be addressed in this talk is whether it is possible to design general statistical methods that have good estimation properties not only when the model is exact but also when it is only approximate.

Video recording: