Skip to main content
Log in

Efficient Modelling of Presence-Only Species Data via Local Background Sampling

  • Published:
Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

Abstract

In species distribution modelling, records of species presence are often modelled as a realization of a spatial point process whose intensity is a function of environmental covariates. One way to fit a spatial point process model is to apply logistic regression to an artificial case–control sample consisting of the observed presence records combined with a simulated pattern of background points, usually a uniform random sample from within the study’s spatial domain. In this paper we propose local background sampling as an alternative to uniform background sampling when using logistic regression to fit spatial point process models to data. Our method is similar to the local case–control sampling procedure of Fithian and Hastie (Ann Appl Stat 42:1693–1724, 2014), but differs in that background points are sampled with probability proportional to an initial intensity estimate based on a pilot point process model. We compare local background sampling with uniform background sampling in a simulation study and in an example modelling the distributions of bumble bees (genus Bombus) in Ontario, Canada. Our results show local background sampling to be more efficient than uniform background sampling in all simulated settings and across all species analysed.

Supplementary materials accompanying this paper appear online.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Source: GBIF occurrence download, https://doi.org/10.15468/dl.fvby3r

Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Aarts, G., Fieberg, J., and Matthiopoulos, J. (2012). Comparative interpretation of count, presence–absence and point methods for species distribution models. Methods in Ecology and Evolution3, 177–187.

    Article  Google Scholar 

  • Baddeley, A. (2018). A statistical commentary on mineral prospectivity analysis. In Daya Sagar, B. S., Cheng Q. and Agterberg, F., editors, Handbook of Mathematical Geosciences, pp. 25–65. Springer, Cham.

    Chapter  Google Scholar 

  • Baddeley, A., Berman, M., Fisher, N. I., Hardegen, A., Milne, R. K., Schuhmacher, D., Shah, R., and Turner, R. (2010). Spatial logistic regression and change-of-support in Poisson point processes. Electronic Journal of Statistics4, 1151–1201.

    Article  MathSciNet  Google Scholar 

  • Baddeley, A., Rubak, E., and Turner, R. (2015). Spatial Point Patterns: Methodology and Applications with R. Chapman and Hall/CRC Press, London.

    Book  Google Scholar 

  • Baddeley, A. and Turner, R. (2000). Practical maximum pseudolikelihood for spatial point patterns (with discussion). Australian & New Zealand Journal of Statistics42, 283–322.

    Article  MathSciNet  Google Scholar 

  • Barbet-Massin, M., Jiguet, F., Albert, C. H., and Thuiller, W. (2012). Selecting pseudo-absences for species distribution models: how, where and how many? Methods in Ecology and Evolution3, 327–338.

    Article  Google Scholar 

  • Berman, M. and Turner, T. R. (1992). Approximating point process likelihoods with GLIM. Applied Statistics41, 31–38.

    Article  Google Scholar 

  • Cameron, S. A., Lozier, J. D., Strange, J. P., Koch, J. B., Cordes, N., Solter, L. F., and Griswold, T. L. (2011). Patterns of widespread decline in North American bumble bees. Proceedings of the National Academy of Sciences108, 662–667.

    Article  Google Scholar 

  • Colla, S. R. (2016). Status, threats and conservation recommendations for wild bumble bees (Bombus spp.) in Ontario, Canada: a review for policymakers and practitioners. Natural Areas Journal36, 412–427.

    Article  Google Scholar 

  • Diggle, P. (1985). A kernel method for smoothing point process data. Journal of the Royal Statistical Society: Series C (Applied Statistics)34, 138–147.

    MATH  Google Scholar 

  • Elith, J. and Leathwick, J. R. (2009). Species distribution models: ecological explanation and prediction across space and time. Annual Review of Ecology, Evolution, and Systematics40, 677–697.

    Article  Google Scholar 

  • Elith, J., Phillips, S. J., Hastie, T., Dudík, M., Chee, Y. E., and Yates, C. J. (2011). A statistical explanation of MaxEnt for ecologists. Diversity and Distributions17, 43–57.

    Article  Google Scholar 

  • Feng, X., Castro, M. C., Linde, E., and Papeş, M. (2017). Armadillo Mapper: A case study of an online application to update estimates of species’ potential distributions. Tropical Conservation Science10, 1–5.

    Google Scholar 

  • Fithian, W. and Hastie, T. (2013). Finite-sample equivalence in statistical models for presence-only data. The Annals of Applied Statistics7, 1917–1939.

    Article  MathSciNet  Google Scholar 

  • Fithian, W. and Hastie, T. (2014). Local case-control sampling: efficient subsampling in imbalanced data sets. The Annals of Statistics42, 1693–1724.

    Article  MathSciNet  Google Scholar 

  • Fois, M., Fenu, G., Lombrana, A. C., Cogoni, D., and Bacchetta, G. (2015). A practical method to speed up the discovery of unknown populations using species distribution models. Journal for Nature Conservation24, 42–48.

    Article  Google Scholar 

  • Franklin, J. (2010). Mapping Species Distributions: Spatial Inference and Prediction. Cambridge University Press, Cambridge, UK.

    Book  Google Scholar 

  • GBIF (2019). GBIF occurrence download. https://doi.org/10.15468/dl.fvby3r.

  • Goulson, D., Lye, G. C., and Darvill, B. (2008). Decline and conservation of bumble bees. Annual Review of Entomology53, 191–208.

    Article  Google Scholar 

  • Guisan, A., Thuiller, W., and Zimmermann, N. E. (2017). Habitat Suitability and Distribution Models with Applications in R. Cambridge University Press, Cambridge, UK.

    Book  Google Scholar 

  • Guisan, A., Tingley, R., Baumgartner, J. B., Naujokaitis-Lewis, I., Sutcliffe, P. R., Tulloch, A. I., Regan, T. J., Brotons, L., McDonald-Madden, E., and Mantyka-Pringle, C. (2013). Predicting species distributions for conservation decisions. Ecology Letters16, 1424–1435.

    Article  Google Scholar 

  • Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G., and Jarvis, A. (2005). Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology25, 1965–1978.

    Article  Google Scholar 

  • Hijmans, R. J. and Graham, C. H. (2006). The ability of climate envelope models to predict the effect of climate change on species distributions. Global Change Biology12, 2272–2281.

    Article  Google Scholar 

  • Jiménez-Valverde, A., Peterson, A. T., Soberón, J., Overton, J., Aragón, P., and Lobo, J. M. (2011). Use of niche models in invasive species risk assessments. Biological Invasions13, 2785–2797.

    Article  Google Scholar 

  • Klein, A.-M., Vaissiere, B. E., Cane, J. H., Steffan-Dewenter, I., Cunningham, S. A., Kremen, C., and Tscharntke, T. (2006). Importance of pollinators in changing landscapes for world crops. Proceedings of the Royal Society B: Biological Sciences274, 303–313.

    Article  Google Scholar 

  • Lobo, J. M., Jiménez-Valverde, A., and Real, R. (2008). AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography17, 145–151.

    Article  Google Scholar 

  • Merow, C., Smith, M. J., and Silander Jr, J. A. (2013). A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. Ecography36, 1058–1069.

    Article  Google Scholar 

  • Naimi, B., Hamm, N. A. S., Groen, T. A., Skidmore, A. K., and Toxopeus, A. G. (2014). Where is positional uncertainty a problem for species distribution modelling? Ecography37, 191–203.

    Article  Google Scholar 

  • Pearce, J. L. and Boyce, M. S. (2006). Modelling distribution and abundance with presence-only data. Journal of Applied Ecology43, 405–412.

    Article  Google Scholar 

  • Peterson, A. T., Soberón, J., Pearson, R. G., Anderson, R. P., Martínez-Meyer, E., Nakamura, M., and Araújo, M. B. (2011). Ecological Niches and Geographic Distributions. Princeton University Press, Princeton, NJ.

    Book  Google Scholar 

  • Phillips, S. J., Anderson, R. P., and Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling190, 231–259.

    Article  Google Scholar 

  • Phillips, S. J., Dudík, M., Elith, J., Graham, C. H., Lehmann, A., Leathwick, J., and Ferrier, S. (2009). Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications19, 181–197.

    Article  Google Scholar 

  • Phillips, S. J., Dudík, M., and Schapire, R. E. (2017). Maxent software for modeling species niches and distributions (Version 3.4.1). http://biodiversityinformatics.amnh.org/open_source/maxent/.

  • Renner, I. W., Elith, J., Baddeley, A., Fithian, W., Hastie, T., Phillips, S. J., Popovic, G., and Warton, D. I. (2015). Point process models for presence-only analysis. Methods in Ecology and Evolution6, 366–379.

    Article  Google Scholar 

  • Renner, I. W. and Warton, D. I. (2013). Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology. Biometrics69, 274–281.

    Article  MathSciNet  Google Scholar 

  • Rinnhofer, L. J., Roura-Pascual, N., Arthofer, W., Dejaco, T., Thaler-Knoflach, B., Wachter, G. A., Christian, E., Steiner, F. M., and Schlick-Steiner, B. C. (2012). Iterative species distribution modelling and ground validation in endemism research: an alpine jumping bristletail example. Biodiversity and Conservation21, 2845–2863.

    Article  Google Scholar 

  • Snäll, T., Kindvall, O., Nilsson, J., and Pärt, T. (2011). Evaluating citizen-based presence data for bird monitoring. Biological Conservation144, 804–810.

    Article  Google Scholar 

  • Thurman, A. L. and Zhu, J. (2014). Variable selection for spatial Poisson point processes via a regularization method. Statistical Methodology17, 113–125.

    Article  MathSciNet  Google Scholar 

  • Valavi, R., Elith, J., Lahoz-Monfort, J. J., and Guillera-Arroita, G. (2019). blockcv: An R package for generating spatially or environmentally separated folds for \(k\)-fold cross-validation of species distribution models. Methods in Ecology and Evolution10, 225–232.

    Article  Google Scholar 

  • Warton, D. and Aarts, G. (2013). Advancing our thinking in presence-only and used-available analysis. Journal of Animal Ecology82, 1125–1134.

    Article  Google Scholar 

  • Warton, D. I. and Shepherd, L. C. (2010). Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology. The Annals of Applied Statistics4, 1383–1402.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Funding was provided by the Natural Sciences and Engineering Research Council of Canada (Discovery Grant 261497-2011-RGPIN).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey Daniel.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 37343 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Daniel, J., Horrocks, J. & Umphrey, G.J. Efficient Modelling of Presence-Only Species Data via Local Background Sampling. JABES 25, 90–111 (2020). https://doi.org/10.1007/s13253-019-00380-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-019-00380-4

Keywords

Navigation