Abstract
In species distribution modelling, records of species presence are often modelled as a realization of a spatial point process whose intensity is a function of environmental covariates. One way to fit a spatial point process model is to apply logistic regression to an artificial case–control sample consisting of the observed presence records combined with a simulated pattern of background points, usually a uniform random sample from within the study’s spatial domain. In this paper we propose local background sampling as an alternative to uniform background sampling when using logistic regression to fit spatial point process models to data. Our method is similar to the local case–control sampling procedure of Fithian and Hastie (Ann Appl Stat 42:1693–1724, 2014), but differs in that background points are sampled with probability proportional to an initial intensity estimate based on a pilot point process model. We compare local background sampling with uniform background sampling in a simulation study and in an example modelling the distributions of bumble bees (genus Bombus) in Ontario, Canada. Our results show local background sampling to be more efficient than uniform background sampling in all simulated settings and across all species analysed.
Supplementary materials accompanying this paper appear online.
Similar content being viewed by others
References
Aarts, G., Fieberg, J., and Matthiopoulos, J. (2012). Comparative interpretation of count, presence–absence and point methods for species distribution models. Methods in Ecology and Evolution3, 177–187.
Baddeley, A. (2018). A statistical commentary on mineral prospectivity analysis. In Daya Sagar, B. S., Cheng Q. and Agterberg, F., editors, Handbook of Mathematical Geosciences, pp. 25–65. Springer, Cham.
Baddeley, A., Berman, M., Fisher, N. I., Hardegen, A., Milne, R. K., Schuhmacher, D., Shah, R., and Turner, R. (2010). Spatial logistic regression and change-of-support in Poisson point processes. Electronic Journal of Statistics4, 1151–1201.
Baddeley, A., Rubak, E., and Turner, R. (2015). Spatial Point Patterns: Methodology and Applications with R. Chapman and Hall/CRC Press, London.
Baddeley, A. and Turner, R. (2000). Practical maximum pseudolikelihood for spatial point patterns (with discussion). Australian & New Zealand Journal of Statistics42, 283–322.
Barbet-Massin, M., Jiguet, F., Albert, C. H., and Thuiller, W. (2012). Selecting pseudo-absences for species distribution models: how, where and how many? Methods in Ecology and Evolution3, 327–338.
Berman, M. and Turner, T. R. (1992). Approximating point process likelihoods with GLIM. Applied Statistics41, 31–38.
Cameron, S. A., Lozier, J. D., Strange, J. P., Koch, J. B., Cordes, N., Solter, L. F., and Griswold, T. L. (2011). Patterns of widespread decline in North American bumble bees. Proceedings of the National Academy of Sciences108, 662–667.
Colla, S. R. (2016). Status, threats and conservation recommendations for wild bumble bees (Bombus spp.) in Ontario, Canada: a review for policymakers and practitioners. Natural Areas Journal36, 412–427.
Diggle, P. (1985). A kernel method for smoothing point process data. Journal of the Royal Statistical Society: Series C (Applied Statistics)34, 138–147.
Elith, J. and Leathwick, J. R. (2009). Species distribution models: ecological explanation and prediction across space and time. Annual Review of Ecology, Evolution, and Systematics40, 677–697.
Elith, J., Phillips, S. J., Hastie, T., Dudík, M., Chee, Y. E., and Yates, C. J. (2011). A statistical explanation of MaxEnt for ecologists. Diversity and Distributions17, 43–57.
Feng, X., Castro, M. C., Linde, E., and Papeş, M. (2017). Armadillo Mapper: A case study of an online application to update estimates of species’ potential distributions. Tropical Conservation Science10, 1–5.
Fithian, W. and Hastie, T. (2013). Finite-sample equivalence in statistical models for presence-only data. The Annals of Applied Statistics7, 1917–1939.
Fithian, W. and Hastie, T. (2014). Local case-control sampling: efficient subsampling in imbalanced data sets. The Annals of Statistics42, 1693–1724.
Fois, M., Fenu, G., Lombrana, A. C., Cogoni, D., and Bacchetta, G. (2015). A practical method to speed up the discovery of unknown populations using species distribution models. Journal for Nature Conservation24, 42–48.
Franklin, J. (2010). Mapping Species Distributions: Spatial Inference and Prediction. Cambridge University Press, Cambridge, UK.
GBIF (2019). GBIF occurrence download. https://doi.org/10.15468/dl.fvby3r.
Goulson, D., Lye, G. C., and Darvill, B. (2008). Decline and conservation of bumble bees. Annual Review of Entomology53, 191–208.
Guisan, A., Thuiller, W., and Zimmermann, N. E. (2017). Habitat Suitability and Distribution Models with Applications in R. Cambridge University Press, Cambridge, UK.
Guisan, A., Tingley, R., Baumgartner, J. B., Naujokaitis-Lewis, I., Sutcliffe, P. R., Tulloch, A. I., Regan, T. J., Brotons, L., McDonald-Madden, E., and Mantyka-Pringle, C. (2013). Predicting species distributions for conservation decisions. Ecology Letters16, 1424–1435.
Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G., and Jarvis, A. (2005). Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology25, 1965–1978.
Hijmans, R. J. and Graham, C. H. (2006). The ability of climate envelope models to predict the effect of climate change on species distributions. Global Change Biology12, 2272–2281.
Jiménez-Valverde, A., Peterson, A. T., Soberón, J., Overton, J., Aragón, P., and Lobo, J. M. (2011). Use of niche models in invasive species risk assessments. Biological Invasions13, 2785–2797.
Klein, A.-M., Vaissiere, B. E., Cane, J. H., Steffan-Dewenter, I., Cunningham, S. A., Kremen, C., and Tscharntke, T. (2006). Importance of pollinators in changing landscapes for world crops. Proceedings of the Royal Society B: Biological Sciences274, 303–313.
Lobo, J. M., Jiménez-Valverde, A., and Real, R. (2008). AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography17, 145–151.
Merow, C., Smith, M. J., and Silander Jr, J. A. (2013). A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. Ecography36, 1058–1069.
Naimi, B., Hamm, N. A. S., Groen, T. A., Skidmore, A. K., and Toxopeus, A. G. (2014). Where is positional uncertainty a problem for species distribution modelling? Ecography37, 191–203.
Pearce, J. L. and Boyce, M. S. (2006). Modelling distribution and abundance with presence-only data. Journal of Applied Ecology43, 405–412.
Peterson, A. T., Soberón, J., Pearson, R. G., Anderson, R. P., Martínez-Meyer, E., Nakamura, M., and Araújo, M. B. (2011). Ecological Niches and Geographic Distributions. Princeton University Press, Princeton, NJ.
Phillips, S. J., Anderson, R. P., and Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling190, 231–259.
Phillips, S. J., Dudík, M., Elith, J., Graham, C. H., Lehmann, A., Leathwick, J., and Ferrier, S. (2009). Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications19, 181–197.
Phillips, S. J., Dudík, M., and Schapire, R. E. (2017). Maxent software for modeling species niches and distributions (Version 3.4.1). http://biodiversityinformatics.amnh.org/open_source/maxent/.
Renner, I. W., Elith, J., Baddeley, A., Fithian, W., Hastie, T., Phillips, S. J., Popovic, G., and Warton, D. I. (2015). Point process models for presence-only analysis. Methods in Ecology and Evolution6, 366–379.
Renner, I. W. and Warton, D. I. (2013). Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology. Biometrics69, 274–281.
Rinnhofer, L. J., Roura-Pascual, N., Arthofer, W., Dejaco, T., Thaler-Knoflach, B., Wachter, G. A., Christian, E., Steiner, F. M., and Schlick-Steiner, B. C. (2012). Iterative species distribution modelling and ground validation in endemism research: an alpine jumping bristletail example. Biodiversity and Conservation21, 2845–2863.
Snäll, T., Kindvall, O., Nilsson, J., and Pärt, T. (2011). Evaluating citizen-based presence data for bird monitoring. Biological Conservation144, 804–810.
Thurman, A. L. and Zhu, J. (2014). Variable selection for spatial Poisson point processes via a regularization method. Statistical Methodology17, 113–125.
Valavi, R., Elith, J., Lahoz-Monfort, J. J., and Guillera-Arroita, G. (2019). blockcv: An R package for generating spatially or environmentally separated folds for \(k\)-fold cross-validation of species distribution models. Methods in Ecology and Evolution10, 225–232.
Warton, D. and Aarts, G. (2013). Advancing our thinking in presence-only and used-available analysis. Journal of Animal Ecology82, 1125–1134.
Warton, D. I. and Shepherd, L. C. (2010). Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology. The Annals of Applied Statistics4, 1383–1402.
Acknowledgements
Funding was provided by the Natural Sciences and Engineering Research Council of Canada (Discovery Grant 261497-2011-RGPIN).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Daniel, J., Horrocks, J. & Umphrey, G.J. Efficient Modelling of Presence-Only Species Data via Local Background Sampling. JABES 25, 90–111 (2020). https://doi.org/10.1007/s13253-019-00380-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-019-00380-4