Abstract
Generalized linear models with categorical explanatory variables are considered and parameters of the model are estimated by an exact maximum likelihood method. The existence of a sequence of maximum likelihood estimators is discussed and considerations on possible link functions are proposed. A focus is then given on two particular positive distributions: the Pareto 1 distribution and the shifted log-normal distributions. Finally, the approach is illustrated on an actuarial dataset to model insurance losses.
Similar content being viewed by others
References
Albert A, Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71(1):1–10
Beirlant J, Goegebeur Y (2003) Regression with response distributions of pareto-type. Comput Stat Data Anal 42(4):595–619
Beirlant J, Goegebeur Y, Verlaak R, Vynckier P (1998) Burr regression and portfolio segmentation. Insur Math Econ 23(3):231–250
Beirlant J, Goegebeur Y, Teugels J, Segers J (2004) Statistics of extremes: theory and applications. Wiley, Hoboken
Bühlmann H, Gisler A (2006) A course in credibility theory and its applications. Springer, Berlin
Chavez-Demoulin V, Embrechts P, Hofert M (2015) An extreme value approach for modeling operational risk losses depending on covariates. J Risk Insur 83(3):735–776
Davison A, Smith R (1990) Models for exceedances over high thresholds. J R Stat Soc Ser B 52(3):393–442
Fahrmeir L, Kaufmann H (1985) Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann Stat 13(1):342–368
Fienberg SE (2007) The analysis of cross-classified categorical data, 2nd edn. Springer, Berlin
Goldburd M, Khare A, Tevet D (2016) Generalized linear models for insurance rating. CAS monograph series number 5. Casualty Actuarial Society, Arlington
Haberman SJ (1974) Log-linear models for frequency tables with ordered classifications. Biometrics 30(4):589–600
Hambuckers J, Heuchenne C, Lopez O (2016) A semiparametric model for generalized pareto regression based on a dimension reduction assumption. HAL. https://hal.archives-ouvertes.fr/hal-01362314/
Hogg RV, Klugman SA (1984) Loss distributions. Wiley, Hoboken
Johnson N, Kotz S, Balakrishnan N (2000) Continuous univariate distributions, vol 1, 2nd edn. Wiley, Hoboken
Lehmann EL, Casella G (1998) Theory of point estimation, 2nd edn. Springer, Berlin
Lipovetsky S (2015) Analytical closed-form solution for binary logit regression by categorical predictors. J Appl Stat 42(1):37–49
McCullagh P, Nelder JA (1989) Generalized linear models, vol 37. CRC Press, Boca Raton
Nelder JA, Wedderburn RWM (1972) Generalized linear models. J R Stat Soc Ser A 135(3):370–384
Ohlsson E, Johansson B (2010) Non-life insurance pricing with generalized linear models. Springer, Berlin
Olver FWJ, Lozier DW, Boisvert RF, Clark CW (eds) (2010) NIST handbook of mathematical functions. Cambridge University Press, Cambridge
Ozkok E, Streftaris G, Waters HR, Wilkie AD (2012) Bayesian modelling of the time delay between diagnosis and settlement for critical illness insurance using a burr generalised-linear-type model. Insur Math Econ 50(2):266–279
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/
Reiss R, Thomas M (2007) Statistical analysis of extreme values, 3rd edn. Birkhauser, Basel
Rigby R, Stasinopoulos D (2005) Generalized additive models for location, scale and shape. Appl Stat 54(3):507–554
Smyth GK, Verbyla AP (1999) Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10(6):696–709
Silvapulle MJ (1981) On the existence of maximum likelihood estimators for the binomial response models. J R Stat Soc Ser B (Methodological) 43(3):310–313
Venables W, Ripley B (2002) Modern applied statistics with S. Springer, Berlin
Acknowledgements
This research benefited also from the support of the ‘Chair Risques Emergents ou atypiques en Assurance’, under the aegis of Fondation du Risque, a joint initiative by Le Mans University, Ecole Polytechnique and MMA company, member of Covea group. The authors thank Vanessa Desert for her active support during the writing of this paper. The authors are also very grateful for the useful suggestions of the two referees. This work is supported by the research project “PANORisk” and Région Pays de la Loire (France).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Proofs of Sect. 3
1.1 Proof for the one-variable case
Proof of Theorem 3.1
We have to solve the system
The system \(S(\varvec{\vartheta })=0\) is
that is
The first equation in the previous system is redundancy, and
Hence if \(Y_i\) takes values in \(\mathbb {Y}\subset b'(\varLambda )\), and \(\ell \) injective, we have
The system (23) is
Let us compute the determinant of the matrix \(M_d = \left( \begin{array}{c} \varvec{Q} \\ \varvec{R}\end{array}\right) \). Consider \(\varvec{R} = (r_0,r_1,\ldots ,r_d)\). We have
The determinant can be computed recursively
Since \( \det (M_1) = -r_0+ r_1 \) and \( \det (M_2) = -r_2 -(-r_0 +r_1) = r_0 - r_1 -r _2, \) we get \(\det (M_d) = (-1)^d r_0+ (-1)^{d+1}(r_1+\dots + r_d) =(-1)^d( r_0 - r_1-\dots -r_d)\). This determinant is non zero as long as \(r_0 \ne \sum _{j=1}^d r_j\).
Now we compute the inverse of matrix \(M_d\) by a direct inversion.
Let us check the inverse of \(M_d\)
So as long as \(r_0 \ne \sum _{j=1}^d r_j\)
In an other way, the system (24) is equivalent to
and for \((\varvec{Q}\, \varvec{R})\) of full rank, the matrix \((\varvec{Q}'\varvec{Q} + \varvec{R}'\varvec{R})\) is invertible and \( {\varvec{\vartheta }} = (\varvec{Q}'\varvec{Q} + \varvec{R}'\varvec{R})^{-1}\varvec{Q}'\varvec{g({\bar{Y}})}. \)\(\square \)
Examples—Choice of the contrast vector \(\varvec{R}\)
- 1.
Taking \(r_0=1, \varvec{r}=\varvec{0}\) leads to \( -r_0 + \varvec{r} \varvec{1}_d=-1 \Rightarrow \widehat{\varvec{\vartheta }}_n = \left( \begin{array}{c} 0 \\ \varvec{g({\bar{Y}})} \end{array}\right) . \)
- 2.
Taking \(r_0=0, \varvec{r}=(1,\varvec{0})\) leads to
$$\begin{aligned} -r_0 + \varvec{r} \varvec{1}_d=1 \Rightarrow \widehat{\varvec{\vartheta }}_n = \left( \begin{array}{c} g({\bar{Y}}_n^{(1)})\\ 0\\ g({\bar{Y}}_n^{(2)}) - g({\bar{Y}}_n^{(1)})\\ \vdots \\ g({\bar{Y}}_n^{(d)}) - g({\bar{Y}}_n^{(1)})) \end{array}\right) . \end{aligned}$$ - 3.
Taking \(r_0=0, \varvec{r}=\varvec{1}\) leads to
$$\begin{aligned} -r_0 + \varvec{r} \varvec{1}_d=d \Rightarrow \widehat{\varvec{\vartheta }}_n = \left( \begin{array}{c} \overline{\varvec{g({\bar{Y}})}}\\ g({\bar{Y}}_n^{(1)}) - \overline{\varvec{g({\bar{Y}})}}\\ \dots \\ g({\bar{Y}}_n^{(d)}) - \overline{\varvec{g({\bar{Y}})}} \end{array}\right) , \text { with } \overline{\varvec{g({\bar{Y}})}} = \dfrac{1}{d}\displaystyle \sum _{j=1}^dg(\overline{Y}_n^{(j)}). \end{aligned}$$
Proof of Remark 3.4
We have to solve the system
If \(\ell \) is injective, the system simplifies to
\(\square \)
Proof of Remark 3.5
Let \(Y_i\) from the exponential family \(F_{exp}(a,b,c,\lambda ,\phi )\). It is well known, that the moment generating function of \(Y_i\) is
Hence, the moment generating function of the average \({\overline{Y}}_m\) is
So we get back to a known result that \({\overline{Y}}_m\) belongs to the exponential family \(F_{exp}(x\mapsto a(x)/m,b,c,\lambda ,\phi )\) (e.g. McCullagh and Nelder 1989).
In our setting, random variables in the average \(\overline{Y}_n^{(j)}\) are i.i.d. with functions a, b, c and parameters \(\lambda =\ell (\vartheta _{(1)}+\vartheta _{(j)})\) and \(\phi \). And \({\overline{Y}}_n^{(j)}\) also belongs to the exponential family with the same parameter but with the function \({\bar{a}}:x\mapsto a(x)/m_j\). In particular,
But the computation of \(\mathbf {E}g({\overline{Y}}_n^{(j)})\) remains difficult unless g is a linear function. By the strong law of large numbers, as \(m_j\rightarrow +\,\infty \), the estimator is consistent since
By the Central Limit Theorem (i.e. \({\overline{Y}}_n^{(j)}\) converges in distribution to a normal distribution) and using the Delta Method, we obtain that the following
\(\square \)
Proof of Corollaries 3.1
The log likelihood of \(\widehat{\varvec{\vartheta }}_n\) is defined by
In fact, we must be verified than \(\ell ({\widehat{\eta }}_i)\) does not depend on g function. If we consider \(\widehat{\varvec{\vartheta }}_n\) defined by (8), we have \(\varvec{Q}\widehat{\varvec{\vartheta }}_n = \varvec{g({\bar{y}})}\) , since \(\widehat{\varvec{\vartheta }}_n\) is solution of the system (23), i.e. \(\varvec{Q}(\varvec{Q}'\varvec{Q} + \varvec{R}'\varvec{R})^{-1}\varvec{Q}'=I\) Using \({\widehat{\eta }}_i= (\varvec{Q}\widehat{\varvec{\vartheta }}_n)_j\) for i such that \(x_i^{(2),j}=1\) we obtain
and
In the same way,
\(\square \)
1.2 Proof for the two-variable case
Proof of Theorem 3.2
The system \(S(\varvec{\vartheta })=0\) is
that is
The system have exactly \(1+d_2+d_3\) redundancies, and \(S(\varvec{\vartheta })=0\) reduces to
Hence the system has rank \({ KL}^\star \) and if \(Y_i\) takes values in \(\mathbb {Y}\subset b'(\varLambda )\), and \(\ell \) injective, we have
In the same way of proof of Theorem 3.1, we have to solve
that is, because \(\varvec{Q}\varvec{Q}'+\varvec{R}\varvec{R}'\) is full rank, in the same way of proof of Theorem 3.1
In that case, the MLE solves a least square problem with response variable \(\varvec{g({\bar{Y}})}\), explanatory variable \(\varvec{Q}\) under a linear constraint \(\varvec{R}\).
- 1.
Under linear contrasts (\({\tilde{C}}_0\)), the model (10) is equivalent to model (6) with \(J=KL^\star \) modalities. Hence the solution is evident.
- 2.
Under linear contrasts (\({\tilde{C}}_\varSigma \) ), the system
$$\begin{aligned} \vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl} = g({\bar{Y}}_n^{(k,l)})\quad \forall (k,l)\in KL^\star \end{aligned}$$implies that
$$\begin{aligned} \sum _{(k,l)\in KL^\star }m_{k,l}(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl}) = \sum _{(k,l)\in KL^\star }m_{k,l} g({\bar{Y}}_n^{(k,l)}). \end{aligned}$$Using
$$\begin{aligned} \sum _{(k,l)\in KL^\star }m_{k,l}= & {} n,\quad \sum _{(k,l)\in KL^\star }m_{k,l}\vartheta _{(2),k} = \sum _{k\in K}\sum _{l\in L^\star _k}m_{k,l}\vartheta _{(2),k}\nonumber \\= & {} \sum _{k\in K}m^{(2)}_k\vartheta _{(2),k}= 0,\\ \sum _{(k,l)\in KL^\star }m_{k,l}\vartheta _{(3),l}= & {} \sum _{l\in L}\sum _{k\in K^\star _l}m_{k,l}\vartheta _{(3),l}= \sum _{l\in L}m^{(3)}_l\vartheta _{(3),l}= 0,\nonumber \\&\quad \sum _{(k,l)\in KL^\star }m_{k,l}\vartheta _{kl} =0, \end{aligned}$$we get \(\vartheta _{(1)} = \dfrac{1}{n}\displaystyle \sum \nolimits _{(k,l)\in KL^\star }m_{k,l} g({\bar{Y}}_n^{(k,l)}).\) In the same way, taking summation over \(K^\star _l\) for \(l\in L\) and over \(L^\star _k\) for \(k\in K\), we found \(\vartheta _{(2),k}\) and \(\vartheta _{(3),l}\), and then \(\vartheta _{kl}\).
With main effect only, the system \(S(\varvec{\vartheta })=0\) is
There are \(1+d_2+d_3\) equations for \(1+d_2+d_3\) parameters, but each explanatory variable are colinear. So, the two additional constraints \(\varvec{R}\varvec{\vartheta }=0\) ensures that a solution exist for the remaining \(d_2+d_3-1\) parameters. Using \(\sum _k x_i^{(2),k}=1\), the second set of equations becomes \(\forall l\in L\)
Similarly, the third set of equations becomes \(\forall k\in K\)
Even with a canonical link \(\ell (x)=x\) so that \(\ell '(x)=1\), this system is not a least-square problem for a nonlinear g function. \(\square \)
Calculus of the Log-likelihoods appearing in Sects. 4 and 5
Consider the Pareto GLM described on (13) and (15). The b function is \(b(\lambda ) = -\log (\lambda )\), using corollary 3.1 we have \(\ell (\hat{\eta }_i) = (b')^{-1}(\overline{z}_n^{(j)})=-(\overline{z}_n^{(j)})^{-1}\) for j such that \(x_i^{(2),j}=1\) and
Compute the original log likelihood of Pareto 1 distribution:
Hence with \(z_i=-\log (y_i/\mu )\),
Now consider the shifted log-normal GLM described on (18) and (19). Here, the b function is \(b(\lambda )=\lambda ^2/2\), hence using Corollary 3.1, we have \(\ell (\hat{\eta }_i) = (b')^{-1}(\overline{z}_n^{(j)})=\overline{z}_n^{(j)}\) for j such that \(x_i^{(2),j}=1\) and Eq. (21) holds.
Let us compute the original log likelihood of the shifted log normal distribution:
with \(z_i=\log (y_i-\mu )\). Hence
Using \( {\widehat{\phi }} = \frac{1}{n}\sum _{j\in J}\sum _{i, x_i^{(2),j}=1}\left( z_i - {\bar{z}}_n^{(j)}\right) ^2 \) leads to the desired result.
Link functions and descriptive statistics
Rights and permissions
About this article
Cite this article
Brouste, A., Dutang, C. & Rohmer, T. Closed-form maximum likelihood estimator for generalized linear models in the case of categorical explanatory variables: application to insurance loss modeling. Comput Stat 35, 689–724 (2020). https://doi.org/10.1007/s00180-019-00918-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-019-00918-7