My Publications

Submitted papers and preprints

Tree Pólya splitting models with zero inflation. Application to forecasting joint species distribution

Biometrics

F. Moudjeu, J. Peyhardi, M. Réjou-Méchain, P. Soh Takam, F. Mortier

Abstract

Species distribution models, which consist of a species-by-species modeling approach, are widely used in ecology to understand species behavior and predict their distribution based on environmental data. However, in species-rich ecosystems with many rare species, such an approach is doomed to failure. Moreover, univariate approaches ignore species dependencies. However, biodiversity is not merely the sum of species, but the result of multiple interactions. Modeling multivariate count data that allow for flexible dependencies, as well as zero inflation and overdispersion, is a challenge. In this paper, we develop a new family of models called the zero-inflated binary tree Pólya-splitting models. This family allows the decomposition of multivariate count data into a successive sub-model along a known binary partition tree. In the first part, we will present the general form of this model, studying its properties in terms of marginal and conditional properties (distribution and moment). The second part presents the extension to the regression context. Finally, we finish presenting results on a real case study based on an impressive data set consisting of the abundance of more than 180 tree taxa sampled on 1,571 plots covering more than 6 million hectares of tropical rainforests in the Congo basin.

Published papers

2025

Tree Polya splitting models for multivariate count data

Journal of Multivariate Analysis

S. Valiquette, J. Peyhardi, E. Marchand, G. Toulemonde, F. Mortier

Abstract

In this article, we develop a new class of multivariate distributions adapted for count data, called Tree Pólya Splitting. This class results from the combination of a univariate distribution and singular multivariate distributions along a fixed partition tree. As we will demonstrate, these distributions are flexible, allowing for the modeling of complex dependencies (positive, negative, or null) at the observation level. Specifically, we present the theoretical properties of Tree Pólya Splitting distributions by focusing primarily on marginal distributions, factorial moments, and dependency structures (covariance and correlations). The abundance of 17 species of Trichoptera recorded at 49 sites is used, on one hand, to illustrate the theoretical properties developed in this article on a concrete case, and on the other hand, to demonstrate the interest of this type of models, notably by comparing them to classical approaches in ecology or microbiome.

2025

GLMcat: An R Package for Generalized Linear Models for Categorical Responses

Journal of Statistical Software

L. Leon, J. Peyhardi, C. Trottier

Abstract

In statistical modeling, there is a wide variety of generalized linear models for categorical response variables (nominal or ordinal responses); yet, there is no software embracing all these models together in a unique and generic framework. We propose and present GLMcat, an R package to estimate generalized linear models implemented under the unified specification (r, F, Z) where r represents the ratio of probabilities (reference, cumulative, adjacent, or sequential), F the cumulative distribution function for the linkage, and Z the design matrix. All classical models (and their variations) for categorical data can be written as an (r, F, Z) triplet, thus, they can be fitted with GLMcat. The functions in the package are intuitive and user-friendly. For each of the three components, there are multiple alternatives from which the user should thoroughly select those that best address the objectives of the analysis. The main strengths of the GLMcat package are the possibility of choosing from a large number of link functions (defined by the composition of F and r) and the simplicity for setting constraints in the linear prediction, either on the intercepts or on the slopes. This paper proposes a methodological and practical guide for the appropriate selection of a model considering the concordance between the nature of the data and the properties of the model.

Performance of the Student binary regression model according to the data separation setting

2025

Communication in Statistics - Simulation and Computation

L. Leon, J. Peyhardi, C. Trottier

Abstract

The link function is the key component of regression models for binary response variables. Despite the diverse potential fits obtained from different link functions, only the logit and the probit links have been widely popularized. Maximum likelihood estimations in models generated from these links are known to be non-robust in the presence of outliers. We show that this problem is exacerbated when the two response levels are strongly separated in the explanatory space. To address this shortcoming, we propose and encourage the use of the maximum likelihood estimation with the Student link function. We highlight its robustness to outliers and also to noisy variables, particularly when the data exhibit a strong separation setting, still keeping all the maximum likelihood estimation's properties.

2024

Integer autoregressive models based on quasi Pólya thinning operator

Statistical Inference of Stochastic Processes

J. Peyhardi

Abstract

Autoregressive models adapted to count time series have received less attention than their classical counterparts for continuous time series. The main approach is based on thinning stochastic operation that preserves the discrete nature of the variable between successive times. The binomial thinning operator is the most popular and the Poisson distribution emerges as the natural choice for the residual distribution of the process. The present paper introduces the quasi Polya thinning operators, that includes the binomial thinning operator as a special case. The family of additive modified power series distribution is defined and is shown to be the natural choice for the residual distribution of the process. We obtain the most general class of integer valued autoregressive models of order 1 (INAR(1) models) with margins having analytic form and the property of closure under convolution introduced by Joe (1996). It includes the usual cases of Poisson and generalized Poisson margins, but also the less usual cases of binomial and negative binomial margins and the new case of generalized negative binomial margin. These models cover a high range of dispersion that are strictly ordered from the binomial case to the generalized negative binomial case. Asymptotic normality of the maximum likelihood estimator (MLE) for such INAR(1) models is obtained. The class of integer valued moving average models of order q (INMA(q) models) based on quasi Polya thinning operator is also introduced. Finally, the proposed INAR(1) models are applied on simulated and real datasets.

2023

On quasi Polya thinning operator

Brazilian Journal of Probability and Statistics

Jean Peyhardi

Abstract

Thinning operation is a stochastic operation that shrinks a random count variable into another one. This kind of random operation has been intensively studied during the seventies to characterize some count distributions, such as the Poisson distribution using the binomial thinning operator (also named binomial damage model). Then, the closure under thinning operator has been studied in order to define some classes of integer valued autoregressive (INAR) models for count time series. These two properties will be studied in this paper for the new class of quasi Polya thinning operators. Classical results concerning the binomial thinning operator are recovered as a special case. The quasi Polya thinning operator is related to the new class of quasi Polya splitting distributions, defined for multivariate count data. The probabilistic graphical model (PGM) of these multivariate distributions is characterized. Finally a general class of integer valued autoregressive models is introduced, including the usual cases of Poisson marginal or generalized Poisson as a special cases and the generalized negative binomial as a new case.

2024

Pólya-splitting distributions as stationary solutions of multivariate birth-death processes under extended neutral theory

Journal of Theoretical Biology

J. Peyhardi, F. Laroche, F. Mortier

Abstract

Multivariate count distributions are crucial for the inference of ecological processes underpinning biodiversity. In particular, neutral theory provides useful null distributions allowing the evaluation of adaptation or natural selection. In this paper, we build a broader family of multivariate distributions: the Polya-splitting distributions. We show that they emerge naturally as stationary distributions of a multivariate birth-death process. This family of distributions is a consistent extension of non-zero sum neutral models based on a master equation approach. It allows considering both total abundance of the community and relative abundances of species. We emphasize that this family is large enough to encompass various dependence structures among species. We also introduce the strong closure under addition property that can be useful to generate nested multi-level dependence structures. Although all P\'olya splitting distributions do not share this property, we provide numerous example verifying it. They include the previously known example with independent species, and also new ones with alternative dependence structures. Overall, we advocate that Polya-splitting distribution should become a part of the classic toolbox for the analysis of multivariate count data in ecology, providing alternative approaches to joint species distribution framework. Comparatively, our approach allows to model dependencies between species at the observation level, while the classical JSDM’s model dependencies at the latent process strata.

2023

Asymptotic tail properties of Poisson mixture distributions

Stat

S. Valiquette, G. Toulemonde, J. Peyhardi, E. Marchand, F. Mortier

Abstract

Count data are omnipresent in many applied fields, often with overdispersion. With mixtures of Poisson distributions representing an elegant and appealing modelling strategy, we focus here on how the tail behaviour of the mixing distribution is related to the tail of the resulting Poisson mixture. We define five sets of mixing distributions and we identify for each case whenever the Poisson mixture is in, close to or far from a domain of attraction of maxima. We also characterize how the Poisson mixture behaves similarly to a standard Poisson distribution when the mixing distribution has a finite support. Finally, we study, both analytically and numerically, how goodness-of-fit can be assessed with the inspection of tail behaviour.

2021

Splitting models for multivariate count data

Journal of Multivariate Analysis

J. Peyhardi, P. Fernique, J.B. Durand

Abstract

We investigate the class of splitting distributions as the composition of a singular multivariate distribution and a univariate distribution. It will be shown that most common parametric count distributions (multinomial, negative multinomial, multivariate hypergeometric, multivariate negative hypergeometric, ...) can be written as splitting distributions with separate parameters for both components, thus facilitating their interpretation, inference, the study of their probabilistic characteristics and their extensions to regression models. We highlight many probabilistic properties deriving from the compound aspect of splitting distributions and their underlying algebraic properties. Parameter inference and model selection are thus reduced to two separate problems, preserving time and space complexity of the base models. Based on this principle, we introduce several new distributions. In the case of multinomial splitting distributions, conditional independence and asymptotic normality properties for estimators are obtained. Mixtures of splitting regression models are used on a mango tree dataset in order to analyze the patchiness.

2020

Robustness of Student link function in multinomial choice models

Journal of Choice Modelling

Jean Peyhardi

Abstract

The Student distribution has already been used to obtain robust maximum likelihood estimator (MLE) in the framework of binary choice models. But, until recently, only the logit and probit binary models were extended to the case of multinomial choices, resulting in the multinomial logit (MNL) and the multinomial probit (MNP). The recently introduced family of reference models, well defines a multivariate extension of any binary choice model, i.e. for any link function. In particular, this is the first extension of the binary robit to the case of multinomial choices. These models define the choice probability for category j relative to an (interchangeable) reference category. This paper highlights the robustness of reference models with Student link function, by showing that the influence function is bounded. Inference of the MLE is detailed through the Fisher’s scoring algorithm, which is appropriated since reference models belong to the family of generalized linear models (GLMs). These models are compared to the MNL on the benchmark dataset of travel mode choice between Sydney and Melbourne. The results obtained on this dataset with reference models are completely different compared with those usually obtained with MNL, nested logit (NL) or MNP that failed to select relevant attributes. It will be shown that the travel mode choice is totally deterministic according to the transfer time. In fact, the use of Student link function allow us to detect the total artificial aspect of this famous dataset.

2019

A new family of qualitative choice models: An application of
reference models to travel mode choice

Transportation Research Part B

H. Bouscasse, I. Joly, J. Peyhardi

Abstract

This paper considers the recently introduced family of reference models dedicated to non ordered alternatives. The link function of reference models is that of the multinomial logit model (MNL) replacing the logistic cumulative distribution function (cdf) by other cdfs (e.g., Gumbel, Student). We determine all usual economic outputs (willingness-to-pay, elasticities,...). We also show that the IIA property generally does not hold for this family of models, because of their noninvariance to the alternative chosen as a reference. We estimate and compare ﬁve reference models to the MNL on a travel mode-choice survey: according to the chosen cdf, reference models lead to a better ﬁt and retrieve consistent economic outputs estimations even when there is a high unobserved heterogeneity.

2017

Item response models for the longitudinal analysis of health-related quality of life in cancer clinical trials

BMC Medical Research Methodology

A. Barbieri, J. Peyhardi, T. Conroy, S. Gourgou, C. Lavergne, C. Mollevi

Abstract

Background: The use of health-related quality of life (HRQoL) as an endpoint in cancer clinical trials is growing rapidly. Hence, research into the statistical approaches used to analyze HRQoL data is of major importance, and could lead to a better understanding of the impact of treatments on the everyday life and care of patients. Amongst the models that are used for the longitudinal analysis of HRQoL, we focused on the mixed models from item response theory, to directly analyze raw data from questionnaires. Methods: We reviewed the different item response models for ordinal responses, using a recent classification of generalized linear models for categorical data. Based on methodological and practical arguments, we then proposed a conceptual selection of these models for the longitudinal analysis of HRQoL in cancer clinical trials. Results: To complete comparison studies already present in the literature, we performed a simulation study based on random part of the mixed models, so to compare the linear mixed model classically used to the selected item response models. As expected, the sensitivity of the item response models to detect random effects with lower variance is better than that of the linear mixed model. We then used a cumulative item response model to perform a longitudinal analysis of HRQoL data from a cancer clinical trial. Conclusions: Adjacent and cumulative item response models seem particularly suitable for HRQoL analysis. In the specific context of cancer clinical trials and the comparison between two groups of HRQoL data over time, the cumulative model seems to be the most suitable, given that it is able to generate a more complete set of results and gives an intuitive illustration of the data.

2017

Characterization of convolution splitting graphical models

Statistics and Probability Letters

J. Peyhardi, P. Fernique

Abstract

We aim at characterizing graphical models for convolution splitting distributions. Only marginal independence have been studied through the well-known Rao–Rubin condition. We generalize this condition for conditional independence and deduce the desired characterizations

2016

Partitioned conditional generalized linear models for
categorical responses

Statistical Modelling

J. Peyhardi, C. Trottier, Y. Guédon

Abstract

In categorical data analysis, several regression models have been proposed for hierarchically structured responses, such as the nested logit model, the two-step model or the partitioned conditional model for partially ordered set. The speciﬁcations of these models are heterogeneous and they have been formally deﬁned for only two or three levels in the hierarchy. Here, we introduce the class of partitioned conditional generalized linear models (PCGLMs) that encompasses all these models and is deﬁned for any number of levels in the hierarchy. The hierarchical structure of these models is fully speciﬁed by a partition tree of categories. Using the genericity of the recently introduced (r, F, Z) speciﬁcation of generalized linear models (GLMs) for categorical responses, it is possible to use different link functions and explanatory variables for each partitioning step. PCGLMs thus constitute a very ﬂexible framework for modelling hierarchically structured categorical responses including partially ordered responses.

2017

Integrative models for joint analysis of shoot growth and
branching patterns

New Phythologist

J. Peyhardi, Y. Caraglio, E. Costes, P.E. Laury, C. Trottier, Y. Guédon

Abstract

Plants exhibit dependences between shoot growth and branching that generate highly structured patterns. The characterization of the patterning mechanism is still an open issue because of the developmental processes involved with both succession of events (e.g. internode elongation, axillary shoot initiation and elongation) and complex dependences among neighbouring positions along the parent shoot. Statistical models called semi-Markov switching partitioned conditional generalized linear models were built on the basis of apple and pear tree datasets. In these models, the semi-Markov chain represents both the succession and lengths of branching zones, whereas the partitioned conditional generalized linear models represent the influence of parent shoot growth variables on axillary productions within each branching zone. Parent shoot growth variables were shown to influence specific developmental events. On this basis, the growth and branching patterns of two apple tree (Malus domestica) cultivars, as well as of pear trees (Pyrus spinosa) between two successive growing cycles, were compared. The proposed integrative statistical models were able to decipher the roles of successive developmental events in the growth and branching patterning mechanisms. These models could incorporate other parent shoot explanatory variables, such as the local curvature or the maximum growth rate of the leaf.

2013

New insights for estimating the genetic value of segregating
apple progenies for irregular bearing during the first years of
tree production

Journal of Experimental Botany

J.B. Durand1, B. Guitton, J. Peyhardi, Y. Holtz, Y. Guédon, C. Trottier, E. Costes

Abstract

Because irregular bearing generates major agronomic issues in fruit-tree species, particularly in apple, the selection of regular cultivars is desirable. Here, we aimed to define methods and descriptors allowing a diagnostic for bearing behaviour during the first years of tree maturity, when tree production is increasing. Flowering occurrences were collected at whole-tree and (annual) shoot scales on a segregating apple population. At both scales, the number of inflorescences over the years was modelled. Two descriptors were derived from model residuals: a new biennial bearing index, based on deviation around yield trend over years and an autoregressive coefficient, which represents dependency between consecutive yields. At the shoot scale, entropy was also considered to represent the within-tree flowering synchronicity. Clusters of genotypes with similar bearing behaviours were built. Both descriptors at the whole-tree and shoot scales were consistent for most genotypes and were used to discriminate regular from biennial and irregular genotypes. Quantitative trait loci were detected for the new biennial bearing index at both scales. Combining descriptors at a local scale with entropy showed that regular bearing at the tree scale may result from different strategies of synchronization in flowering at the local scale. The proposed methods and indices open an avenue to quantify bearing behaviour during the first years of tree maturity and to capture genetic variations. Their extension to other progenies and species, possible variants of descriptors, and their use in breeding programmes considering a limited number of years or fruit yields are discussed.

2015

A new specification of generalized linear models
for categorical responses

Biometrika

J. Peyhardi, C. Trottier, Y. Guédon

Abstract

Many regression models for categorical responses have been introduced, motivated by different paradigms, but it is difficult to compare them because of their different specifications. In this paper we propose a unified specification of regression models for categorical responses, based on a decomposition of the link function into an inverse continuous cumulative distribution function and a ratio of probabilities. This allows us to define a new family of reference models for nominal responses, comparable to the families of adjacent, cumulative and sequential models for ordinal responses. A new equivalence between cumulative and sequential models is shown. Invariances under permutations of the categories are studied for each family of models. We introduce a reversibility property that distinguishes adjacent and cumulative models from sequential models. The new family of reference models is tested on three benchmark classification datasets.

My Publications

Submitted papers and preprints

Tree Pólya splitting models with zero inflation. Application to forecasting joint species distribution

Published papers

Tree Polya splitting models for multivariate count data

GLMcat: An R Package for Generalized Linear Models for Categorical Responses

Performance of the Student binary regression model according to the data separation setting

Integer autoregressive models based on quasi Pólya thinning operator

On quasi Polya thinning operator

Pólya-splitting distributions as stationary solutions of multivariate birth-death processes under extended neutral theory

Asymptotic tail properties of Poisson mixture distributions

Splitting models for multivariate count data

Robustness of Student link function in multinomial choice models

A new family of qualitative choice models: An application of reference models to travel mode choice

Item response models for the longitudinal analysis of health-related quality of life in cancer clinical trials

Characterization of convolution splitting graphical models

Partitioned conditional generalized linear models for categorical responses

Integrative models for joint analysis of shoot growth and branching patterns

New insights for estimating the genetic value of segregating apple progenies for irregular bearing during the first years of tree production

A new specification of generalized linear models for categorical responses

A new family of qualitative choice models: An application of
reference models to travel mode choice

Partitioned conditional generalized linear models for
categorical responses

Integrative models for joint analysis of shoot growth and
branching patterns

New insights for estimating the genetic value of segregating
apple progenies for irregular bearing during the first years of
tree production

A new specification of generalized linear models
for categorical responses