Closed-form maximum likelihood estimator for generalized linear models in the case of categorical explanatory variables: application to insurance loss modeling

Brouste, Alexandre; Dutang, Christophe; Rohmer, Tom

doi:10.1007/s00180-019-00918-7

Closed-form maximum likelihood estimator for generalized linear models in the case of categorical explanatory variables: application to insurance loss modeling

Original paper
Published: 22 August 2019

Volume 35, pages 689–724, (2020)
Cite this article

Computational Statistics Aims and scope Submit manuscript

437 Accesses
4 Citations
Explore all metrics

Abstract

Generalized linear models with categorical explanatory variables are considered and parameters of the model are estimated by an exact maximum likelihood method. The existence of a sequence of maximum likelihood estimators is discussed and considerations on possible link functions are proposed. A focus is then given on two particular positive distributions: the Pareto 1 distribution and the shifted log-normal distributions. Finally, the approach is illustrated on an actuarial dataset to model insurance losses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

An Introduction to Machine Learning for Panel Data

Article 01 February 2021

Distributionally robust stochastic programs with side information based on trimmings

Article Open access 22 November 2021

References

Albert A, Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71(1):1–10
Article MathSciNet MATH Google Scholar
Beirlant J, Goegebeur Y (2003) Regression with response distributions of pareto-type. Comput Stat Data Anal 42(4):595–619
Article MathSciNet MATH Google Scholar
Beirlant J, Goegebeur Y, Verlaak R, Vynckier P (1998) Burr regression and portfolio segmentation. Insur Math Econ 23(3):231–250
Article MATH Google Scholar
Beirlant J, Goegebeur Y, Teugels J, Segers J (2004) Statistics of extremes: theory and applications. Wiley, Hoboken
Book MATH Google Scholar
Bühlmann H, Gisler A (2006) A course in credibility theory and its applications. Springer, Berlin
MATH Google Scholar
Chavez-Demoulin V, Embrechts P, Hofert M (2015) An extreme value approach for modeling operational risk losses depending on covariates. J Risk Insur 83(3):735–776
Article Google Scholar
Davison A, Smith R (1990) Models for exceedances over high thresholds. J R Stat Soc Ser B 52(3):393–442
MathSciNet MATH Google Scholar
Fahrmeir L, Kaufmann H (1985) Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann Stat 13(1):342–368
Article MathSciNet MATH Google Scholar
Fienberg SE (2007) The analysis of cross-classified categorical data, 2nd edn. Springer, Berlin
Book MATH Google Scholar
Goldburd M, Khare A, Tevet D (2016) Generalized linear models for insurance rating. CAS monograph series number 5. Casualty Actuarial Society, Arlington
Google Scholar
Haberman SJ (1974) Log-linear models for frequency tables with ordered classifications. Biometrics 30(4):589–600
Article MathSciNet MATH Google Scholar
Hambuckers J, Heuchenne C, Lopez O (2016) A semiparametric model for generalized pareto regression based on a dimension reduction assumption. HAL. https://hal.archives-ouvertes.fr/hal-01362314/
Hogg RV, Klugman SA (1984) Loss distributions. Wiley, Hoboken
Book Google Scholar
Johnson N, Kotz S, Balakrishnan N (2000) Continuous univariate distributions, vol 1, 2nd edn. Wiley, Hoboken
MATH Google Scholar
Lehmann EL, Casella G (1998) Theory of point estimation, 2nd edn. Springer, Berlin
MATH Google Scholar
Lipovetsky S (2015) Analytical closed-form solution for binary logit regression by categorical predictors. J Appl Stat 42(1):37–49
Article MathSciNet Google Scholar
McCullagh P, Nelder JA (1989) Generalized linear models, vol 37. CRC Press, Boca Raton
Book MATH Google Scholar
Nelder JA, Wedderburn RWM (1972) Generalized linear models. J R Stat Soc Ser A 135(3):370–384
Article Google Scholar
Ohlsson E, Johansson B (2010) Non-life insurance pricing with generalized linear models. Springer, Berlin
Book MATH Google Scholar
Olver FWJ, Lozier DW, Boisvert RF, Clark CW (eds) (2010) NIST handbook of mathematical functions. Cambridge University Press, Cambridge
MATH Google Scholar
Ozkok E, Streftaris G, Waters HR, Wilkie AD (2012) Bayesian modelling of the time delay between diagnosis and settlement for critical illness insurance using a burr generalised-linear-type model. Insur Math Econ 50(2):266–279
Article MathSciNet MATH Google Scholar
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/
Reiss R, Thomas M (2007) Statistical analysis of extreme values, 3rd edn. Birkhauser, Basel
MATH Google Scholar
Rigby R, Stasinopoulos D (2005) Generalized additive models for location, scale and shape. Appl Stat 54(3):507–554
MathSciNet MATH Google Scholar
Smyth GK, Verbyla AP (1999) Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10(6):696–709
Article Google Scholar
Silvapulle MJ (1981) On the existence of maximum likelihood estimators for the binomial response models. J R Stat Soc Ser B (Methodological) 43(3):310–313
MathSciNet MATH Google Scholar
Venables W, Ripley B (2002) Modern applied statistics with S. Springer, Berlin
Book MATH Google Scholar

Download references

Acknowledgements

This research benefited also from the support of the ‘Chair Risques Emergents ou atypiques en Assurance’, under the aegis of Fondation du Risque, a joint initiative by Le Mans University, Ecole Polytechnique and MMA company, member of Covea group. The authors thank Vanessa Desert for her active support during the writing of this paper. The authors are also very grateful for the useful suggestions of the two referees. This work is supported by the research project “PANORisk” and Région Pays de la Loire (France).

Author information

Authors and Affiliations

CEREMADE, CNRS, Univ. Paris-Dauphine, Univ. PSL, Place du Maréchal de Lattre de Tassigny, 75016, Paris, France
Christophe Dutang
Institut du Risque de l’Assurance, Laboratoire Manceau de Mathématiques, Le Mans Université, Avenue Olivier Messiaen, 72085, Le Mans, France
Alexandre Brouste & Tom Rohmer

Authors

Alexandre Brouste
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Dutang
View author publications
You can also search for this author in PubMed Google Scholar
Tom Rohmer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christophe Dutang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Proofs of Sect. 3

1.1 Proof for the one-variable case

Proof of Theorem 3.1

We have to solve the system

$$\begin{aligned} \left\{ \begin{array}{ll}S(\varvec{\vartheta }) = 0\\ \varvec{R}\varvec{\vartheta }=0. \end{array}\right. \end{aligned}$$

(23)

The system $S(\varvec{\vartheta })=0$ is

$$\begin{aligned} \left\{ \begin{array}{ll}\displaystyle \sum _{i=1}^n\ell '(\eta _i)\left( y_i - b'\circ \ell (\eta _i)\right) = 0\\ \displaystyle \sum _{i=1}^nx_i^{(2),j}\ell '(\eta _i)\left( y_i - b'\circ \ell (\eta _i)\right) = 0,\quad \forall j\in J. \end{array}\right. \end{aligned}$$

that is

$$\begin{aligned} \left\{ \begin{array}{ll}\displaystyle \sum _{j\in J}\ell '(\vartheta _{(1)}+\vartheta _{(2),j})\left( \sum _{i=1}^nx_i^{(2),j}y_i - m_jb'\circ \ell (\vartheta _{(1)} + \vartheta _{(2),j})\right) = 0\\ \ell '(\vartheta _{(1)}+\vartheta _{(2),j})\left( \displaystyle \sum _{i=1}^nx_i^{(2),j}y_i - m_jb'\circ \ell (\vartheta _{(1)} + \vartheta _{(2),j})\right) = 0,\quad \forall j\in J. \end{array}\right. \end{aligned}$$

The first equation in the previous system is redundancy, and

$$\begin{aligned} S(\varvec{\vartheta }) = 0 \Leftrightarrow \ell '(\vartheta _{(1)}+\vartheta _{(2),j})\left( \sum _{i=1}^nx_i^{(2),j}y_i - m_jb'\circ \ell (\vartheta _{(1)} + \vartheta _{(2),j})\right) = 0,\quad \forall j\in J. \end{aligned}$$

Hence if $Y_i$ takes values in $\mathbb {Y}\subset b'(\varLambda )$, and $\ell $ injective, we have

$$\begin{aligned} \vartheta _{(1)}+\vartheta _{(j)} = g(\overline{Y}_n^{(j)})\quad \forall j\in J. \end{aligned}$$

The system (23) is

$$\begin{aligned} \left\{ \begin{array}{ll}\varvec{Q}\varvec{\vartheta }= \varvec{g({\bar{Y}})}\\ \varvec{R}\varvec{\vartheta }=0. \end{array}\right. \Leftrightarrow \left( \begin{array}{c} \varvec{Q} \\ \varvec{R}\end{array}\right) \varvec{\vartheta }=\left( \begin{array}{c}\varvec{g({\bar{Y}})}\\ 0\end{array}\right) . \end{aligned}$$

(24)

Let us compute the determinant of the matrix $M_d = \left( \begin{array}{c} \varvec{Q} \\ \varvec{R}\end{array}\right) $. Consider $\varvec{R} = (r_0,r_1,\ldots ,r_d)$. We have

$$\begin{aligned} M_d = \left( \begin{array}{c@{\quad }c} \varvec{1}_d &{} I_d \\ r_0 &{} \varvec{r} \end{array}\right) = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 1 &{} 1 &{} 0 &{} \dots \\ 1 &{} 0 &{} 1 &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \ddots &{} \ddots \\ 1 &{} 0 &{} \dots &{} 0 &{} 1 \\ r_0 &{} r_1 &{} &{} \dots &{} r_d \\ \end{array}\right) , \text { with } \varvec{r}= \left( \begin{array}{c@{\quad }cc} r_1 &{} \dots &{} r_d \\ \end{array}\right) , \varvec{1}_d = \left( \begin{array}{c} 1 \\ \vdots \\ 1 \end{array}\right) . \end{aligned}$$

The determinant can be computed recursively

$$\begin{aligned} \det (M_d) = r_d \left| \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} 1 &{} 1 &{} 0 &{} \dots \\ 1 &{} 0 &{} \ddots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} 1 \\ 1 &{} 0 &{} \dots &{} 0 \\ \end{array}\right| - \left| \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} 1 &{} 1 &{} 0 &{} \dots \\ 1 &{} 0 &{} \ddots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} 1 \\ r_0 &{} r_1 &{} \dots &{} r_{d-1} \\ \end{array}\right| = (-1)^{d+1} r_d - \det (M_{d-1}). \end{aligned}$$

Since $ \det (M_1) = -r_0+ r_1 $ and $ \det (M_2) = -r_2 -(-r_0 +r_1) = r_0 - r_1 -r _2, $ we get $\det (M_d) = (-1)^d r_0+ (-1)^{d+1}(r_1+\dots + r_d) =(-1)^d( r_0 - r_1-\dots -r_d)$. This determinant is non zero as long as $r_0 \ne \sum _{j=1}^d r_j$.

Now we compute the inverse of matrix $M_d$ by a direct inversion.

$$\begin{aligned} \left( \begin{array}{c@{\quad }c} \varvec{1}_d &{} I_d \\ r_0 &{} \varvec{r} \end{array}\right) \left( \begin{array}{c@{\quad }c} \varvec{a}' &{} b \\ C &{} \varvec{d} \end{array}\right) = \left( \begin{array}{c@{\quad }c} I_d &{} \varvec{0} \\ \varvec{0}' &{} 1 \end{array}\right) \Leftrightarrow \left\{ \begin{array}{ll}\varvec{1}_d \varvec{a}' + I_d C = I_d \\ b \varvec{1}_d + I_d \varvec{d} = \varvec{0} \\ r_0 \varvec{a}' + \varvec{r} C = \varvec{0}' \\ b r_0 + \varvec{r} \varvec{d} = 1 \end{array}\right. \Leftrightarrow \left\{ \begin{array}{ll}C = I_d - \frac{1}{-r_0 + \varvec{r} \varvec{1}_d}\varvec{1}_d\varvec{r} \\ \varvec{d}= \frac{1}{-r_0+\varvec{r} \varvec{1}_d} \varvec{1}_d \\ \varvec{a}' = \frac{\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} \\ b = \frac{-1}{-r_0+\varvec{r} \varvec{1}_d} \\ \end{array}\right. \end{aligned}$$

Let us check the inverse of $M_d$

$$\begin{aligned}&\left( \begin{array}{c@{\quad }c} \varvec{1}_d &{} I_d \\ r_0 &{} \varvec{r} \end{array}\right) \left( \begin{array}{c@{\quad }c} \frac{\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} &{} \frac{-1}{-r_0+\varvec{r} \varvec{1}_d} \\ I_d - \frac{\varvec{1}_d\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} &{} \frac{\varvec{1}_d}{-r_0+\varvec{r} \varvec{1}_d} \end{array}\right) \\&\qquad = \left( \begin{array}{c@{\quad }c} \frac{\varvec{1}_d \varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} + I_d - \frac{\varvec{1}_d\varvec{r} }{-r_0 + \varvec{r} \varvec{1}_d} &{} \frac{-\varvec{1}_d}{-r_0+\varvec{r} \varvec{1}_d} +\frac{\varvec{1}_d}{-r_0+\varvec{r} \varvec{1}_d}\\ r_0 \frac{\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d}+ \varvec{r} - \frac{\varvec{r}\varvec{1}_d\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} &{} \frac{-r_0}{-r_0+\varvec{r} \varvec{1}_d}+ \frac{\varvec{r}\varvec{1}_d}{-r_0+\varvec{r} \varvec{1}_d} \end{array}\right) \\&\qquad = \left( \begin{array}{c@{\quad }c} I_d &{} 0 \\ 0 &{} 1 \end{array}\right) . \end{aligned}$$

So as long as $r_0 \ne \sum _{j=1}^d r_j$

$$\begin{aligned} \widehat{\varvec{\vartheta }}_n = \left( \begin{array}{c@{\quad }c} \frac{\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} &{} \frac{-1}{-r_0+\varvec{r} \varvec{1}_d} \\ I_d - \frac{\varvec{1}_d\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} &{} \frac{\varvec{1}_d}{-r_0+\varvec{r} \varvec{1}_d} \end{array}\right) \left( \begin{array}{c} \varvec{g}({\varvec{{\bar{Y}}}})\\ 0 \end{array}\right) = \left( \begin{array}{c} \frac{\varvec{r} \varvec{g({\bar{Y}})}}{-r_0 + \varvec{r} \varvec{1}_d} \\ {\varvec{g({\bar{Y}})}} - \varvec{1}_d\frac{\varvec{r} \varvec{g({\bar{Y}})}}{-r_0 + \varvec{r} \varvec{1}_d} \end{array}\right) . \end{aligned}$$

In an other way, the system (24) is equivalent to

$$\begin{aligned} (\varvec{Q}',\varvec{R}')\left( \begin{array}{c} \varvec{Q} \\ \varvec{R}\end{array}\right) \varvec{\vartheta }= \varvec{Q}' \varvec{g({\bar{Y}})}, \end{aligned}$$

and for $(\varvec{Q}\, \varvec{R})$ of full rank, the matrix $(\varvec{Q}'\varvec{Q} + \varvec{R}'\varvec{R})$ is invertible and $ {\varvec{\vartheta }} = (\varvec{Q}'\varvec{Q} + \varvec{R}'\varvec{R})^{-1}\varvec{Q}'\varvec{g({\bar{Y}})}. $$\square $

Examples—Choice of the contrast vector $\varvec{R}$

1.
Taking $r_0=1, \varvec{r}=\varvec{0}$ leads to $ -r_0 + \varvec{r} \varvec{1}_d=-1 \Rightarrow \widehat{\varvec{\vartheta }}_n = \left( \begin{array}{c} 0 \\ \varvec{g({\bar{Y}})} \end{array}\right) . $
2.
Taking $r_0=0, \varvec{r}=(1,\varvec{0})$ leads to
$$\begin{aligned} -r_0 + \varvec{r} \varvec{1}_d=1 \Rightarrow \widehat{\varvec{\vartheta }}_n = \left( \begin{array}{c} g({\bar{Y}}_n^{(1)})\\ 0\\ g({\bar{Y}}_n^{(2)}) - g({\bar{Y}}_n^{(1)})\\ \vdots \\ g({\bar{Y}}_n^{(d)}) - g({\bar{Y}}_n^{(1)})) \end{array}\right) . \end{aligned}$$
3.
Taking $r_0=0, \varvec{r}=\varvec{1}$ leads to
$$\begin{aligned} -r_0 + \varvec{r} \varvec{1}_d=d \Rightarrow \widehat{\varvec{\vartheta }}_n = \left( \begin{array}{c} \overline{\varvec{g({\bar{Y}})}}\\ g({\bar{Y}}_n^{(1)}) - \overline{\varvec{g({\bar{Y}})}}\\ \dots \\ g({\bar{Y}}_n^{(d)}) - \overline{\varvec{g({\bar{Y}})}} \end{array}\right) , \text { with } \overline{\varvec{g({\bar{Y}})}} = \dfrac{1}{d}\displaystyle \sum _{j=1}^dg(\overline{Y}_n^{(j)}). \end{aligned}$$

Proof of Remark 3.4

We have to solve the system

$$\begin{aligned} S(\vartheta ) = 0 \Leftrightarrow \displaystyle \sum _{i=1}^n\ell '(\eta )\left( y_i - b'\circ \ell (\eta )\right) = 0. \end{aligned}$$

If $\ell $ is injective, the system simplifies to

$$\begin{aligned} \displaystyle \sum _{i=1}^n y_i - n b'\circ (b^\prime )^{-1}\circ g^{-1}(\eta ) = 0 \Leftrightarrow \eta = g\left( \begin{array}{c}{\overline{y}}_n\end{array}\right) \Leftrightarrow \theta = g\left( \begin{array}{c}{\overline{y}}_n\end{array}\right) . \end{aligned}$$

$\square $

Proof of Remark 3.5

Let $Y_i$ from the exponential family $F_{exp}(a,b,c,\lambda ,\phi )$. It is well known, that the moment generating function of $Y_i$ is

$$\begin{aligned} \mathbf {E}e^{t Y_i} =\exp \left( \frac{b(\lambda +ta(\phi )) - b(\lambda )}{a(\phi )}\right) . \end{aligned}$$

Hence, the moment generating function of the average ${\overline{Y}}_m$ is

$$\begin{aligned} M_{{\overline{Y}}_m}(t) = \left( \exp \left( \frac{b(\lambda +\frac{t}{m} a(\phi )) - b(\lambda )}{a(\phi )}\right) \right) ^m = \exp \left( \frac{b(\lambda +t a(\phi )/m) - b(\lambda )}{a(\phi )/m}\right) . \end{aligned}$$

So we get back to a known result that ${\overline{Y}}_m$ belongs to the exponential family $F_{exp}(x\mapsto a(x)/m,b,c,\lambda ,\phi )$ (e.g. McCullagh and Nelder 1989).

In our setting, random variables in the average $\overline{Y}_n^{(j)}$ are i.i.d. with functions a, b, c and parameters $\lambda =\ell (\vartheta _{(1)}+\vartheta _{(j)})$ and $\phi $. And ${\overline{Y}}_n^{(j)}$ also belongs to the exponential family with the same parameter but with the function ${\bar{a}}:x\mapsto a(x)/m_j$. In particular,

$$\begin{aligned} \mathbf {E}{\overline{Y}}_n^{(j)} = b'(\ell (\vartheta _{(1)}+\vartheta _{(j)})) = g^{-1}(\vartheta _{(1)}+\vartheta _{(j)}),~ \text{ Var }{\overline{Y}}_n^{(j)} = \frac{a(\phi )}{m_j} b''(\ell (\vartheta _{(1)}+\vartheta _{(j)})). \end{aligned}$$

But the computation of $\mathbf {E}g({\overline{Y}}_n^{(j)})$ remains difficult unless g is a linear function. By the strong law of large numbers, as $m_j\rightarrow +\,\infty $, the estimator is consistent since

$$\begin{aligned} {\overline{Y}}_n^{(j)}{\mathop {\underset{n\rightarrow +\infty }{\longrightarrow }}\limits ^{\text {a.s.}}} g^{-1}(\vartheta _{(1)}+\vartheta _{(j)}) \Rightarrow g({\overline{Y}}_n^{(j)}){\mathop {\underset{n\rightarrow +\infty }{\longrightarrow }}\limits ^{\text {a.s.}}} g(g^{-1}(\vartheta _{(1)}+\vartheta _{(j)}))=\vartheta _{(1)}+\vartheta _{(j)}. \end{aligned}$$

By the Central Limit Theorem (i.e. ${\overline{Y}}_n^{(j)}$ converges in distribution to a normal distribution) and using the Delta Method, we obtain that the following

$$\begin{aligned}&\sqrt{m_j}\left( g({\overline{Y}}_n^{(j)}) - \vartheta _{(1)}+\vartheta _{(j)}\right) {\mathop {\underset{n\rightarrow +\infty }{\longrightarrow }}\limits ^{\mathcal {L}}} \\&\quad {\mathcal {N}}\left( 0, a(\phi )b''(\ell (\vartheta _{(1)}+\vartheta _{(j)})) g'(g^{-1}(\vartheta _{(1)}+\vartheta _{(j)}))^2 \right) . \end{aligned}$$

$\square $

Proof of Corollaries 3.1

The log likelihood of $\widehat{\varvec{\vartheta }}_n$ is defined by

$$\begin{aligned} \log L(\widehat{\varvec{\vartheta }}_n\,|\,\underline{\varvec{y}}) = \frac{1}{a(\phi )}\sum _{i=1}^n \left( y_i \ell ({\widehat{\eta }}_i) - b(\ell ({\widehat{\eta }}_i))\right) + \sum _{i=1}^nc(y_i,\phi ). \end{aligned}$$

In fact, we must be verified than $\ell ({\widehat{\eta }}_i)$ does not depend on g function. If we consider $\widehat{\varvec{\vartheta }}_n$ defined by (8), we have $\varvec{Q}\widehat{\varvec{\vartheta }}_n = \varvec{g({\bar{y}})}$ , since $\widehat{\varvec{\vartheta }}_n$ is solution of the system (23), i.e. $\varvec{Q}(\varvec{Q}'\varvec{Q} + \varvec{R}'\varvec{R})^{-1}\varvec{Q}'=I$ Using ${\widehat{\eta }}_i= (\varvec{Q}\widehat{\varvec{\vartheta }}_n)_j$ for i such that $x_i^{(2),j}=1$ we obtain

$$\begin{aligned} \ell ({\widehat{\eta }}_i)= \displaystyle \sum _{j=1}^d\ell \circ g(\bar{y}_n^{(j)})x_i^{(2),j} = \displaystyle \sum _{j=1}^d\ell \circ \ell ^{-1}\circ (b')^{-1}({\bar{y}}_n^{(j)})x_i^{(2),j} = \displaystyle \sum _{j=1}^d (b')^{-1}({\bar{y}}_n^{(j)})x_i^{(2),j}, \end{aligned}$$

and

$$\begin{aligned} \log L(\widehat{\varvec{\vartheta }}_n\,|\,\underline{\varvec{y}}) = \frac{1}{a(\phi )}\sum _{j=1}^d\sum _{i, x_i^{(2)}=v_j} \left( y_i (b')^{-1}\left( {\overline{y}}_n^{(j)}\right) - b\left( \left( b'\right) ^{-1}\left( {\overline{y}}_n^{(j)}\right) \right) \right) + \sum _{i=1}^nc(y_i,\phi ). \end{aligned}$$

In the same way,

$$\begin{aligned} \widehat{\mathbf {E}Y_i}= & {} b'(\ell ({\widehat{\eta }}_i)) = \sum _{j=1}^d \bar{y}_n^{(j)}x_i^{(2),j}, \quad \widehat{\text{ Var }Y_i} = a(\phi )b''(\ell ({\widehat{\eta }}_i)) \\= & {} a(\phi )\sum _{j=1}^d b''\circ (b')^{-1}({\bar{y}}_n^{(j)})x_i^{(2),j}. \end{aligned}$$

$\square $

1.2 Proof for the two-variable case

Proof of Theorem 3.2

The system $S(\varvec{\vartheta })=0$ is

$$\begin{aligned} \left\{ \begin{array}{ll}\displaystyle \sum _{i=1}^n\ell '(\eta _i)\left( y_i - b'\circ \ell (\eta _i)\right) = 0\\ \displaystyle \sum _{i=1}^nx_i^{(3),l}\ell '(\eta _i)\left( y_i - b'\circ \ell (\eta _i)\right) = 0,\quad \forall l\in L\\ \displaystyle \sum _{i=1}^nx_i^{(2),k}\ell '(\eta _i)\left( y_i - b'\circ \ell (\eta _i)\right) = 0,\quad \forall k\in K\\ \displaystyle \sum _{i=1}^nx_i^{kl}\ell '(\eta _i)\left( y_i - b'\circ \ell (\eta _i)\right) = 0,\quad \forall (k,l)\in KL^\star . \end{array}\right. \end{aligned}$$

that is

$$\begin{aligned} \left\{ \begin{array}{ll}\displaystyle \sum _{(k,l)\in KL^\star }\ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\left( \sum _{i=1}^nx_i^{(k,l)}y_i - m_{k,l}b'\circ \ell (\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\right) = 0\\ \displaystyle \sum _{k\in K_l^\star }\ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\left( \sum _{i=1}^nx_i^{(k,l)}y_i - m_{k,l}b'\circ \ell (\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\right) = 0\quad \forall l\in L\\ \displaystyle \sum _{l\in L_k^\star }\ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\left( \sum _{i=1}^nx_i^{(k,l)}y_i - m_{k,l}b'\circ \ell (\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\right) = 0\quad \forall k\in K\\ \ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\left( \displaystyle \sum _{i=1}^nx_i^{(k,l)}y_i - m_{k,l}b'\circ \ell (\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\right) = 0\quad \forall (k,l)\in KL^\star . \end{array}\right. \end{aligned}$$

The system have exactly $1+d_2+d_3$ redundancies, and $S(\varvec{\vartheta })=0$ reduces to

$$\begin{aligned}&\ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\left( \displaystyle \sum _{i=1}^nx_i^{(k,l)}y_i - m_{k,l}b'\circ \ell (\vartheta _{(1)}+\vartheta _{(2),k} \right. \nonumber \\&\quad \left. + \vartheta _{(3),l} + \vartheta _{kl}) \right) = 0\quad \forall (k,l)\in { KL}^\star . \end{aligned}$$

Hence the system has rank ${ KL}^\star $ and if $Y_i$ takes values in $\mathbb {Y}\subset b'(\varLambda )$, and $\ell $ injective, we have

$$\begin{aligned} \vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl} = g({\bar{Y}}_n^{(k,l)})\quad \forall (k,l)\in KL^\star . \end{aligned}$$

In the same way of proof of Theorem 3.1, we have to solve

$$\begin{aligned} \left\{ \begin{array}{ll}\varvec{Q}\varvec{\vartheta }= \varvec{g({\bar{Y}})}\\ \varvec{R}\varvec{\vartheta }=\varvec{0}. \end{array}\right. \end{aligned}$$

(25)

that is, because $\varvec{Q}\varvec{Q}'+\varvec{R}\varvec{R}'$ is full rank, in the same way of proof of Theorem 3.1

$$\begin{aligned} {\varvec{\vartheta }} = (\varvec{Q}'\varvec{Q} + \varvec{R}'\varvec{R})^{-1}\varvec{Q}'\varvec{g({\bar{Y}})}. \end{aligned}$$

In that case, the MLE solves a least square problem with response variable $\varvec{g({\bar{Y}})}$, explanatory variable $\varvec{Q}$ under a linear constraint $\varvec{R}$.

1.
Under linear contrasts (${\tilde{C}}_0$), the model (10) is equivalent to model (6) with $J=KL^\star $ modalities. Hence the solution is evident.
2.
Under linear contrasts (${\tilde{C}}_\varSigma $ ), the system
$$\begin{aligned} \vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl} = g({\bar{Y}}_n^{(k,l)})\quad \forall (k,l)\in KL^\star \end{aligned}$$
implies that
$$\begin{aligned} \sum _{(k,l)\in KL^\star }m_{k,l}(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl}) = \sum _{(k,l)\in KL^\star }m_{k,l} g({\bar{Y}}_n^{(k,l)}). \end{aligned}$$
Using
$$\begin{aligned} \sum _{(k,l)\in KL^\star }m_{k,l}= & {} n,\quad \sum _{(k,l)\in KL^\star }m_{k,l}\vartheta _{(2),k} = \sum _{k\in K}\sum _{l\in L^\star _k}m_{k,l}\vartheta _{(2),k}\nonumber \\= & {} \sum _{k\in K}m^{(2)}_k\vartheta _{(2),k}= 0,\\ \sum _{(k,l)\in KL^\star }m_{k,l}\vartheta _{(3),l}= & {} \sum _{l\in L}\sum _{k\in K^\star _l}m_{k,l}\vartheta _{(3),l}= \sum _{l\in L}m^{(3)}_l\vartheta _{(3),l}= 0,\nonumber \\&\quad \sum _{(k,l)\in KL^\star }m_{k,l}\vartheta _{kl} =0, \end{aligned}$$
we get $\vartheta _{(1)} = \dfrac{1}{n}\displaystyle \sum \nolimits _{(k,l)\in KL^\star }m_{k,l} g({\bar{Y}}_n^{(k,l)}).$ In the same way, taking summation over $K^\star _l$ for $l\in L$ and over $L^\star _k$ for $k\in K$, we found $\vartheta _{(2),k}$ and $\vartheta _{(3),l}$, and then $\vartheta _{kl}$.

With main effect only, the system $S(\varvec{\vartheta })=0$ is

$$\begin{aligned} \left\{ \begin{array}{ll}\displaystyle \sum _{i=1}^n\ell '(\eta _i)y_i = \sum _{i=1}^n g^{-1}(\eta _i)\ell '(\eta _i) \\ \displaystyle \sum _{i=1}^nx_i^{(3),l}\ell '(\eta _i) y_i = \sum _{i=1}^nx_i^{(3),l} g^{-1}(\eta _i)\ell '(\eta _i) \quad \forall l\in L\\ \displaystyle \sum _{i=1}^nx_i^{(2),k}\ell '(\eta _i)y_i = \sum _{i=1}^nx_i^{(2),k} g^{-1}(\eta _i)\ell '(\eta _i),\quad \forall k\in K \end{array}\right. \end{aligned}$$

There are $1+d_2+d_3$ equations for $1+d_2+d_3$ parameters, but each explanatory variable are colinear. So, the two additional constraints $\varvec{R}\varvec{\vartheta }=0$ ensures that a solution exist for the remaining $d_2+d_3-1$ parameters. Using $\sum _k x_i^{(2),k}=1$, the second set of equations becomes $\forall l\in L$

$$\begin{aligned}&\displaystyle \sum _{k\in K}\ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l}) {\bar{y}}_n^{(k,l)} m_{k,l} \\&\quad = \sum _{k\in K} g^{-1}(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l}) \ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l}) m_{k,l} \end{aligned}$$

Similarly, the third set of equations becomes $\forall k\in K$

$$\begin{aligned}&\displaystyle \sum _{l\in L}\ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l}) {\bar{y}}_n^{(k,l)} m_{k,l} \\&\quad = \sum _{l\in L} g^{-1}(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l}) \ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l}) m_{k,l} \end{aligned}$$

Even with a canonical link $\ell (x)=x$ so that $\ell '(x)=1$, this system is not a least-square problem for a nonlinear g function. $\square $

Calculus of the Log-likelihoods appearing in Sects. 4 and 5

Consider the Pareto GLM described on (13) and (15). The b function is $b(\lambda ) = -\log (\lambda )$, using corollary 3.1 we have $\ell (\hat{\eta }_i) = (b')^{-1}(\overline{z}_n^{(j)})=-(\overline{z}_n^{(j)})^{-1}$ for j such that $x_i^{(2),j}=1$ and

$$\begin{aligned} \log L(\widehat{\varvec{\vartheta }}_n\,|\,\underline{\varvec{z}}) = \sum _{j=1}^d\sum _{i, x_i^{(2),j}=1} \left( z_i/{\overline{z}}_n^{(j)} - \log \left( -{\overline{z}}_n^{(j)} \right) \right) = n -\sum _{j=1}^d m_j \log \left( -{\overline{z}}_n^{(j)} \right) . \end{aligned}$$

Compute the original log likelihood of Pareto 1 distribution:

$$\begin{aligned} \log L(\varvec{\vartheta }\,|\,\underline{\varvec{y}}) = \sum _{i=1}^n \big (\log \ell (\eta _i) + \ell (\eta _i)\log \mu - (\ell (\eta _i) +1)\log y_i \big ). \end{aligned}$$

Hence with $z_i=-\log (y_i/\mu )$,

$$\begin{aligned} \log L(\widehat{\varvec{\vartheta }}_n\,|\,\underline{\varvec{y}})= & {} \sum _{j=1}^d\sum _{i, x_i^{(2),j}=1}\left( -\log (-\overline{z}_n^{(j)}) -\frac{\log \mu }{{\overline{z}}_n^{(j)}} + \frac{\log (y_i)}{ {\overline{z}}_n^{(j)}} - \log y_i \right) \\= & {} n - \sum _{j=1}^d m_j\log (-{\overline{z}}_n^{(j)}) - \sum _{i=1}^n\log y_i =\log L(\widehat{\varvec{\vartheta }}_n\,|\,\underline{\varvec{z}})- \sum _{i=1}^n\log y_i. \end{aligned}$$

Now consider the shifted log-normal GLM described on (18) and (19). Here, the b function is $b(\lambda )=\lambda ^2/2$, hence using Corollary 3.1, we have $\ell (\hat{\eta }_i) = (b')^{-1}(\overline{z}_n^{(j)})=\overline{z}_n^{(j)}$ for j such that $x_i^{(2),j}=1$ and Eq. (21) holds.

Let us compute the original log likelihood of the shifted log normal distribution:

$$\begin{aligned} \log L(\varvec{\vartheta }\,|\,\underline{\varvec{y}})= & {} \sum _{i=1}^n\left( - \log (x_i-\mu ) - \log (\sqrt{2\pi \phi }) -\dfrac{(\log (x_i-\mu ) - \ell (\eta _i))^2}{2\phi }\right) \\= & {} - \sum _{i=1}^n z_i - n\log (\sqrt{2\pi \phi }) - \sum _{i=1}^n \dfrac{(z_i - \ell (\eta _i))^2}{2\phi }, \end{aligned}$$

with $z_i=\log (y_i-\mu )$. Hence

$$\begin{aligned} \log L(\widehat{\varvec{\vartheta }}\,|\,\underline{\varvec{y}})= & {} - \sum _{i=1}^n z_i - n\log (\sqrt{2\pi \phi }) - \frac{1}{2\phi }\sum _{j=1}^d\sum _{i, x_i^{(2),j}=1} (z_i - \overline{z}_n^{(j)})^2. \end{aligned}$$

Using $ {\widehat{\phi }} = \frac{1}{n}\sum _{j\in J}\sum _{i, x_i^{(2),j}=1}\left( z_i - {\bar{z}}_n^{(j)}\right) ^2 $ leads to the desired result.

Link functions and descriptive statistics

See Fig. 4 and Table 10.

Table 10 Empirical quantiles and moments (in euros)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brouste, A., Dutang, C. & Rohmer, T. Closed-form maximum likelihood estimator for generalized linear models in the case of categorical explanatory variables: application to insurance loss modeling. Comput Stat 35, 689–724 (2020). https://doi.org/10.1007/s00180-019-00918-7

Download citation

Received: 11 May 2018
Accepted: 16 August 2019
Published: 22 August 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00180-019-00918-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Closed-form maximum likelihood estimator for generalized linear models in the case of categorical explanatory variables: application to insurance loss modeling

Abstract

Access this article

Similar content being viewed by others

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

An Introduction to Machine Learning for Panel Data

Distributionally robust stochastic programs with side information based on trimmings

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Proofs of Sect. 3

1.1 Proof for the one-variable case

Proof of Theorem 3.1

Proof of Remark 3.4

Proof of Remark 3.5

Proof of Corollaries 3.1

1.2 Proof for the two-variable case

Proof of Theorem 3.2

Calculus of the Log-likelihoods appearing in Sects. 4 and 5

Link functions and descriptive statistics

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Closed-form maximum likelihood estimator for generalized linear models in the case of categorical explanatory variables: application to insurance loss modeling

Abstract

Access this article

Similar content being viewed by others

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

An Introduction to Machine Learning for Panel Data

Distributionally robust stochastic programs with side information based on trimmings

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Proofs of Sect. 3

1.1 Proof for the one-variable case

Proof of Theorem 3.1

Proof of Remark 3.4

Proof of Remark 3.5

Proof of Corollaries 3.1

1.2 Proof for the two-variable case

Proof of Theorem 3.2

Calculus of the Log-likelihoods appearing in Sects. 4 and 5

Link functions and descriptive statistics

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation