Abstract
In this paper, case-deletion diagnostics in beta regression models are proposed. The diagnostics are based on the distance between the distributions of the maximum likelihood estimates of the model parameters resulting from the entire sample and after removing a sample case. Two metrics between probability distributions are considered: the Frèchet distance (Frèchet in Comptes Rendus hebdomadaires des seances de l’Academie des Sciences de Paris 244:689–692, 1957), and the Rao distance (Rao in Indian J Stat Ser A 9:246–291, 1949). Moreover, a jackknife-after-bootstrap transformation of the diagnostics is also proposed to make clear the decision about cases to be considered as influential. Artificial and real examples are included to illustrate the usefulness of the diagnostics and to compare them to others in the literature.
Similar content being viewed by others
References
Anholeto, T., Sandoval, M.C., Botter, D.A.: Adjusted Pearson residuals in beta regression models. J. Stat. Comput. Simul. 84(5), 999–1014 (2014)
Atkinson, C., Mitchell, A.F.S.: Rao’s distance measure. Sankhya Indian J. Stat. Ser. A 43(3), 345–365 (1981)
Beyaztas, U., Alin, A.: Jackknife-after-bootstrap as logistic regression diagnostic tool. Commun. Stat. Simul. Comput. 43(9), 2047–2060 (2014)
Breton, C.V., Siegmund, K.D., Joubert, B.R., et al.: Prenatal tobacco smoke exposure is associated with childhood DNA CpG methylation. PLoS ONE 9(6), e99716 (2014)
Chien, L.C.: Multiple deletion diagnostics in beta regression models. Comput. Stat. 28, 1639–1661 (2012)
Cressie, N., Read, T.R.C.: Multinomial goodness-of-fit tests. J. R. Stat. Soc. Ser. B 46, 440–464 (1984)
Cribari-Neto, F., Zeilei, A.: Beta regression in R. J. Stat. Softw. 34, 1–24 (2010)
Dowson, D.C., Landau, B.V.: Frèchet’s distance between multivariate normal distributions. J. Multivar. Anal. 12, 450–455 (1982)
Espinheira, P.L., Ferrari, S.L., Cribari-Neto, F.: On beta regression residuals. J. Appl. Stat. 35, 407–419 (2008a)
Espinheira, P.L., Ferrari, S.L., Cribari-Neto, F.: Influence diagnostics in beta regression. Comput. Stat. Data Anal. 52, 4417–4431 (2008b)
Ferrari, S.L., Cribari-Neto, F.: Beta regression for modelling rates and proportions. J. Appl. Stat. 31, 799–815 (2004)
Ferrari, S.L., Espinheira, P.L., Cribari-Neto, F.: Diagnostic tools in beta regression with varying dispersion. Stat. Neerl. 65(3), 337–351 (2011)
Ferrari, S.L., Pinheiro, E.C.: Improved likelihood inference in beta regression. J. Stat. Comput. Simul. 81, 431–443 (2012)
Frèchet, M.: Sur la distance de deux lois de probabilité. Comptes Rendus hebdomadaires des seances de l’Academie des Sciences de Paris 244, 689–692 (1957)
Galvis, D.M., Bandyopadhyay, D., Lachos, V.H.: Augmented mixed beta regression models for periodontal proportion data. Stat. Med. 33, 3759–3771 (2014)
García-Heras, J., Muñoz-García, J., Muñoz-Pichardo, J.M., Pardo, L.: Influence measures based on Cressie–Read divergence measures in multivariate linear model. Commun. Stat. Theory Methods 35, 2055–2073 (2006)
Hadi, A.S., Nyquist, H.: Frèchet’s distance as a tool for diagnosing multivariate data. Linear Algebra Appl. 289, 183–201 (1999)
Han, S., Zhang, H., Lockett, G.A., Mukherjee, N., Holloway, J.W., Karmaus, W.: Identifying heterogeneous transgenerational DNA methylation sites via clustering in beta regression. Ann. Appl. Stat. 9(4), 2052–2072 (2015)
Hunger, M., Doring, A., Holle, R.: Longitudinal beta regression models for analyzing health-related quality of life scores over time. BMC Med. Res. Methodol. 12(144), 1–12 (2012)
Jiménez-Gamero, M.D., Muñoz-Pichardo, J.M., Muñoz-García, J., Pascual, A.: Rao distance as a measure of influence in multivariate linear model. J. Appl. Stat. 29(6), 841–854 (2002)
Johnson, R.W.: Fitting percentage of body fat to simple body measurements. J. Stat. Educ. 4(1) (1996). https://doi.org/10.1080/10691898.1996.11910505
Martin, M.A., Roberts, S.: Jackknife-after-bootstrap regression influence diagnostics. J. Nonparametric Stat. 22, 257–269 (2010)
Martin, M.A., Roberts, S., Zheng, L.: Delete-2 and delete-3 jackknife procedures for unmasking in regression. Aust. N. Z. J. Stat. 52(1), 45–60 (2010)
Muñoz-García, J., Muñoz-Pichardo, J.M., Pardo, L.: Cressie and Read power-divergences as influence measures for logistic regression models. Comput. Stat. Data Anal. 50, 3199–3221 (2006)
Muñoz-Pichardo, J.M., Enguix, A., Muñoz, J., Pascual, A.: Frèchet’s metric as measure of influence in multivariate linear models with random errors elliptically distributed. Comput. Stat. Data Anal. 46, 469–491 (2004)
Muñoz-Pichardo, J.M., Moreno-Rebollo, J.L., Enguix, A., Pascual, A.: Influence measures on profile analysis with elliptical data through Frèchet’s metric. Metrika 68, 111–127 (2008)
Ospina, R., Ferrari, S.L.P.: A general class of zero-or-one inflated beta regression models. Comput. Stat. Data Anal. 56, 1609–1623 (2012)
Pereira, T.L., Cribari-Neto, F.: Detecting model misspecification in inflated beta regressions. Commun. Stat. Simul. Comput. 43(3), 631–656 (2014)
Pregibon, D.: Logistic regression diagnostics. Ann. Stat. 9, 705–724 (1981)
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2017)
Rao, C.R.: On the distance between two populations. Sankhya Indian J. Stat. Ser. A 9, 246–291 (1949)
Roberts, S., Martin, M.A., Zheng, l: An adaptive, automatic multiple-case deletion technique for detecting influence in regression. Technometrics 57(3), 408–417 (2015)
Rocha, A.V., Simas, A.Bs: Influence diagnostics in a general class of beta regression models. Test 20, 95–119 (2011)
Swearingen, C.J., Tilley, B.C., Adams, R.C., et al.: Application of beta regression to analyze ischemic stroke volume in NINDS rt-PA clinical trials. Neuroepidemiology 37, 73–82 (2011)
Wei, B.C., Hu, Y.Q., Fung, W.K.: Generalized leverage and its applications. Scand. J. Stat. 25, 25–37 (1998)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Muñoz-Pichardo, J.M., Moreno-Rebollo, J.L., Pino-Mejías, R. et al. Influence measures in beta regression models through distance between distributions. AStA Adv Stat Anal 103, 163–185 (2019). https://doi.org/10.1007/s10182-018-00332-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-018-00332-2