Skip to main content

‘Batteries’ in Machine Learning: A First Experimental Assessment of Inference for Siberian Crane Breeding Grounds in the Russian High Arctic Based on ‘Shaving’ 74 Predictors

  • Chapter
  • First Online:
Machine Learning for Ecology and Sustainable Natural Resource Management

Abstract

The Siberian crane (Leucogeranus leucogeranus,) remains an elusive but highly regarded species of global conservation concern. Breeding regions occur in the Russian high arctic, and two subpopulations are known. Here we present for the first time a machine learning-based summer habitat analysis using nesting data for the eastern population in the breeding grounds employing predictive modeling with 74 GIS predictors. There is a typical desire for parsimony to help increase interpretability of models, but findings generally show that it would not result in greatest improvement to the model and inference. ‘Batteries’ are a new concept in machine learning allowing to test a set of experiments that help to test on predictors and model selection. Here we show 28 of those ‘batteries’ and compared multiple approaches to model runs from iteratively dropping the least or most important predictor (‘variable shaving’) to allow all predictors to contribute. It was found that the generic ‘kitchen sink’ model with TreeNet (an optimized boosting algorithm from Salford Systems Ltd) performs best. However, while the use of ‘batteries’ remain widely underused in wildlife conservation management, ‘shaving’ was of great use to learn about the structure, role and impacts of predictors and their spatial performance supporting non-parsimonious work. Of great interest is the finding that a bundle of low-ranked predictors performs almost equal to, or better than, the so-called top predictors. This is called ‘Predictor swapping’. This is the best and most detailed habitat study and prediction for the Siberian crane in summer, thus far. It is to be used for conservation management and as a generic template for any species while data availability and the environmental crisis are on the rise, specifically for the high Arctic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr AC-19:716–23, Institute of Statistical Mathematics, Minato-ku, Tokyo, Japan

    Google Scholar 

  • Arnold TW (2010) Uninformative parameters and model selection using Akaike’s information criterion. J Wildl Manag 74:1175–1178

    Article  Google Scholar 

  • Barbet-Massin M, Jiguet F, Albert CH, Thuiller W (2012) Selecting pseudo-absences for species distribution models: how, where and how many? Methods Ecol Evol, 3:327–338. https://doi.org/10.1111/j.2041-210X.2011.00172.x

    Article  Google Scholar 

  • BirdLife International (2001) Threatened birds of Asia: the bird life international red data book, vol 1. Bird Life International Cambridge, Cambridge

    Google Scholar 

  • Breiman L (2001) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16:199–231

    Article  Google Scholar 

  • Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton

    Google Scholar 

  • Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach. Springer, New York

    Google Scholar 

  • Cai T, Huettmann F, Guo Y (2014) Using stochastic gradient boosting to infer stopover habitat selection and distribution of hooded cranes Grus monacha during spring migration in lindian, Northeast China. PLoS ONE 9. https://doi.org/10.1371/journal.pone.0097372

  • Chamberlin TC (1890) The method of multiple working hypotheses. Science 15:92–96

    Google Scholar 

  • Elith J, Graham CH, Anderson RP, Dudík M, Ferrier S, Guisan A, Hijmans RJ, Huettmann F, Leathwick JR, Lehmann A, Li J, Lohmann LG, Loiselle BA, Manion G, Moritz C, Nakamura M, Nakazawa Y, McC J, Overton M, Townsend Peterson A, Phillips SJ, Richardson K, Scachetti-Pereira R, Schapire RE, Soberón J, Williams S, Wisz MS, Zimmermann NE (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:129–151

    Article  Google Scholar 

  • Fielding A (1999) Machine learning methods for ecological applications. Springer, Boston

    Book  Google Scholar 

  • Fielding A, Bell JF (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ Conserv 24:38–49

    Article  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: A gradient boosting machine. Ann Stat 29:1189–1232

    Article  Google Scholar 

  • Friedman JH (2002) Stochastic gradient boosting. Comp Stat Data Anal 38:367–378

    Article  Google Scholar 

  • Guthery FS, Brennan LA, Peterson MJ, Lusk LL (2005) Information theory in wildlife science: critique and viewpoint. J Wildl Manag 69:457–465

    Article  Google Scholar 

  • Han X, Guo Y, Mi C, Huettmann F, Wen L (2017) Machine learning model analysis of breeding habitats for the Blacknecked Crane in Central Asian Uplands under Anthropogenic pressures. Scientific Reports 7, Article number: 6114. https://doi.org/10.1038/s41598-017-06167-2. https://www.nature.com/articles/s41598-017-06167-2

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York

    Google Scholar 

  • Herrick KA, Huettmann F, Lindgren MA (2013) A global model of avian influenza prediction in wild birds: The importance of northern regions. Vet Res. https://doi.org/10.1186/1297-9716-44-42

    Article  Google Scholar 

  • Hilborn R, Mangel M (1997) The ecological detective: Confronting models with data. Princeton University Press, Princeton

    Google Scholar 

  • Hochachka W, Caruana R, Fink D, Munson A, Riedewald M, Sorokina D, Kelling S (2007) Data mining for discovery of pattern and process in ecological systems. J Wildl Manag 71:2427–2437

    Article  Google Scholar 

  • Jiao S, Guo Y, Huettmann F, Lei G (2014) Nest-Site selection analysis of hooded crane (Grus monacha) in northeastern china based on a multivariate ensemble model. Zool Sci 31:430–437

    Article  Google Scholar 

  • Kandel K, Huettmann F, Suwal MK, Regmi GR, Nijman V, Nekaris KAI, Lama ST, Thapa A, Sharma HP, Subedi TR (2015) Rapid multi-nation distribution assessment of a charismatic conservation species using open access ensemble model GIS predictions: red panda (Ailurus fulgens) in the Hindu-Kush Himalaya region. Biol Conserv 181:150–161

    Article  Google Scholar 

  • Kanai Y, Ueta M, Germogenov N, Nagendran M, Mita N, Higuchi H (2002) Migration routes and important resting areas of Siberian cranes (Grus leucogeranus) between northeastern Siberia and China as revealed by satellite tracking. Biol Conserv 106:339–346

    Article  Google Scholar 

  • Klein DR, Magomedova M (2003) Industrial development and wildlife in arctic ecosystems: Can learning from the past lead to a brighter future? In: Rasmussen RO, Koroleva NE (eds) Social and environmental impacts in the North. Kluwer Academic Publishers, The Netherlands, pp 35–56

    Google Scholar 

  • Mace G, Cramer W, Diaz S, Faith DP, Larigauderie A, Le Prestre P, Palmer M, Perrings C, Scholes RJ, Walpole M, Walter BA, Watson JEM, Mooney HA (2010) Biodiversity targets after 2010. Env Sustain 2:3–8

    Google Scholar 

  • Manly FJ, McDonald LL, Thomas DL, McDonald TL, Erickson WP (2002) Resource selection by animals: statistical design and analysis for field studies, Second edn. Kluwer Academic Publishers, Netherlands

    Google Scholar 

  • Matthiessen P (2001) The birds of heaven. Travels with cranes. North Point Press, New York

    Google Scholar 

  • McGarical K, Cushman S, Stafford S (2000) Multivariate statistics for wildlife and ecology research. Springer, New York

    Book  Google Scholar 

  • Mi C, Huettmann F, Guo Y, Han X, Wen L (2017) Why choose random forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence. PeerJ. https://doi.org/10.7717/peerj.2849

    Article  Google Scholar 

  • Moore GS, Ilyashenko E (2009) Regional flyway education programs: increasing public awareness of crane conservation along the crane flyways of Eurasia and North America. In: Prentice C (ed) Conservation of flyway wetlands in East and West/Central Asia. Proceedings of the project completion workshop of the UNEP/GEF Siberian Crane wetland project, 14–15 October 2009, Harbin, China. Baraboo (Wisconsin), USA: International Crane Foundation

    Google Scholar 

  • Mueller JP, Massaron L (2016) Machine learning for dummies. For Dummies Publisher, 435 p

    Google Scholar 

  • Ohse B, Huettmann F, Ickert-Bond S, Juday G (2009) Modeling the distribution of white spruce (Picea glauca) for Alaska with high accuracy: an open access role-model for predicting tree species in last remaining wilderness areas. Polar Biol 32:1717–1724

    Article  Google Scholar 

  • Prentice C (ed) (2010) Conservation of flyway wetlands in East and West/Central Asia. Proceedings of the project completion workshop of the UNEP/GEF Siberian Crane wetland project, 14–15 October 2009, Harbin, China. Baraboo (Wisconsin), USA: International Crane Foundation

    Google Scholar 

  • Sorokin AG, Kotyukov YV (1987) Discovery of the nesting ground of the Ob River population of the Siberian Crane. In: Archibald GW, Pasquier RF (eds) Proceedings of the 1983 international crane workshop. International Crane Foundation, Baraboo, pp 209–212

    Google Scholar 

  • Sorokin A, Markin Y (1996) New nesting site of Siberian Cranes. Newsletter of Russian Bird Conservation Union, Moscow

    Google Scholar 

  • Spiridonov V, Gavrilo M, Krasnov MA, Nikolaeva N, Sergienko L, Popov A, Krasnova E (2011) Toward the new role of marine and coastal protected areas in the arctic: The russian case. In: Huettmann F (ed) Protection of the three poles. Springer, New York

    Google Scholar 

  • Silvy NY (2012) The wildlife techniques manual: research and management, vol 2, 7th edn. John Hopkins University Press, Baltimore

    Google Scholar 

  • Van Impe J (2013) Esquisse de l’avifaune de la Sibérie Occidentale: Une revue bibliographique. Alauda 81:269–296

    Google Scholar 

  • Wu G, Leeuw J, Skidmore AK, Prins HHT, Best EPH, Liu Y (2009) Will the three gorges dam affect the underwater light climate of Vallisneria spiralis L. and food habitat of Siberian Crane in Poyang Lake. Hydrobiologia 623:213–222

    Article  Google Scholar 

  • Yu C, Yinghao W, Qing Y (2008) Ground survey of waterbirds in the Poyang Lake region in Winter 2007/2008. Siberian Crane Flyway News: 15

    Google Scholar 

Download references

Acknowledgement

We thank Dan Steinberg and Salford Systems Ltd. for a workshop with U.S. IALE at Snowbird, Utah, to introduce us to the power of batteries. FH acknowledges the kind and long collaboration with the Forestry University of Beijing, China, and the use of their data. U.S. IALE and S. Linke, C. Cambu, H. Hera, H. Berrios Alvarez and the -EWHALE lab- at UAF, are thanked for their support. This is EWHALE lab publication #185.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Falk Huettmann .

Editor information

Editors and Affiliations

Appendices

Appendix 1: Details of 74 GIS Environmental layers Used in the Model Prediction (+ 3 Additional Internal Columns)

#

Name and abbreviation of GIS layer

Source

Comment

1–12

Monthly mean temperature

tmen_1–12

Worldclim.org

These are standard layers used for GIS modeling

13–24

Monthly minimum temperature

tmin_1–12

Worldclim.org

(see above)

25–36

Monthly maximum temperature

tmax 1–12

Worldclim.org

(see above)

37–48

Monthly precipitation

prec_1–12

Worldclim.org

(see above)

49–67

Bioclim

bio_1–19

worldclim.org/bioclim

(see above)

68

Altitude

Worldclim.org

(see above)

69

Aspect

Worldclim.org

(see above)

70

Slope

Worldclim.org

(see above)

71

Landcover

Landcv

Herrick et al. (2013)

Several of global landcover layers exist

72

Human infrastructure index

Hii

Herrick et al. (2013)

Human footprint. Several human footprint layers

73

Distance to waterbody/lake

Dislke

Mi unpublished

While essential for cranes, this layer is unlikely to be very accurate due to the huge and ephemeral wetlands worldwide

74

Distance to coastline

Discsln

Mi unpublished

Relies on the coastline map resolution

75

x coordinate

ArcGIS

Not often used in most GIS model work but important for geo-referencing

76

y coordinate

ArcGIS

Not often used in most GIS model work but important for geo-referencing

77

Row index

FID

ArcGIS

Not often used in most GIS model work but important for row identification

Appendix 2

1.1 List of Top 20 Predictors, as identified by TreeNet ranking

Predictor

Relative Importance

Bio12

100.0

Bio14

71.2

Bio17

44.2

TMEN9

40.1

Prec12

37.6

Distance to lake

35.1

TMAX12

29.8

Altitude

27.3

Slope

25.9

Tmin1

23.8

Bio1

23.0

Bio19

20.4

Tmen2

19.2

Bio3

18.9

Tmax3

17.9

Bio6

16.3

Tmen7

15.9

Prec6

14.3

Prec7

13.9

Tmin6

12.9

Appendix 3

1.1 Prediction Model Details for the Best Performing Model (the ‘Kitchen sink model’ with 74 predictors)

Siberian crane with a battery run on TreeNet (SPM7) balanced

The kitchensink model, all 74 environmental predictors

figure a

Frequency of Prediction Relative Index of Ocurrence (RIO 0-1) for known presence (1)

figure b
figure c

Appendix 4

(For Prediction map 1 for the ‘Kitchen sink model ’ see Fig. 8.4 in the text; for map legends please see this figure; same for all other appendix maps)

(For Prediction map 2 for the ‘TMax12 model’ see Fig. 8.5 in the text)

1.1 Prediction Map 3 for the ‘BIO14 model’

figure d

1.2 Prediction Map 4 for the ‘TMax12BIO14 model’

figure e

1.3 Prediction Map 5 for the ‘Top5 model’

figure f

1.4 Prediction Map 6 for the ‘Top10 model’

figure g

1.5 Prediction Map 7 for the ‘Top29 model’

figure h

1.6 Prediction Map 8 for the ‘Top35 model’

figure i

1.7 Prediction Map 9 for the ‘Bottom 44 model’

figure j

1.8 Prediction map 10 for the ‘Leaving out top 3 interacting predictors model’

figure k

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Huettmann, F., Mi, C., Guo, Y. (2018). ‘Batteries’ in Machine Learning: A First Experimental Assessment of Inference for Siberian Crane Breeding Grounds in the Russian High Arctic Based on ‘Shaving’ 74 Predictors. In: Humphries, G., Magness, D., Huettmann, F. (eds) Machine Learning for Ecology and Sustainable Natural Resource Management. Springer, Cham. https://doi.org/10.1007/978-3-319-96978-7_8

Download citation

Publish with us

Policies and ethics