Abstract
Due to the rapidly increasing demand for groundwater, as one of the principal freshwater resources, there is an urge to advance novel prediction systems to more accurately estimate the groundwater potential for an informed groundwater resource management. Ensemble machine learning methods are generally reported to produce more accurate results. However, proposing the novel ensemble models along with running comparative studies for performance evaluation of these models would be equally essential to precisely identify the suitable methods. Thus, the current study is designed to provide knowledge on the performance of the four ensemble models i.e., Boosted generalized additive model (GamBoost), adaptive Boosting classification trees (AdaBoost), Bagged classification and regression trees (Bagged CART), and random forest (RF). To build the models, 339 groundwater resources’ locations and the spatial groundwater potential conditioning factors were used. Thereafter, the recursive feature elimination (RFE) method was applied to identify the key features. The RFE specified that the best number of features for groundwater potential modeling was 12 variables among 15 (with a mean Accuracy of about 0.84). The modeling results indicated that the Bagging models (i.e., RF and Bagged CART) had a higher performance than the Boosting models (i.e., AdaBoost and GamBoost). Overall, the RF model outperformed the other models (with accuracy = 0.86, Kappa = 0.67, Precision = 0.85, and Recall = 0.91). Also, the topographic position index’s predictive variables, valley depth, drainage density, elevation, and distance from stream had the highest contribution in the modeling process. Groundwater potential maps predicted in this study can help water resources managers and policymakers in the fields of watershed and aquifer management to preserve an optimal exploit from this important freshwater.
Similar content being viewed by others
Data Availability
Not applicable.
References
Agarwal R, Garg PK (2016) Remote sensing and GIS based groundwater potential & recharge zones mapping using multi-criteria decision making technique. Water Resour Manag 30:243–260
Al-Abadi AM, Shahid S (2015) A comparison between index of entropy and catastrophe theory methods for mapping groundwater potential in an arid region. Environ Monit Assess 187(9):576
Alotaibi NN, Sasi S (2016). Tree-based ensemble models for predicting the ICU transfer of stroke in-patients. In 2016 International Conference on Data Science and Engineering (ICDSE). IEEE, Piscataway, pp 1–6
Aniya M (1985) Landslide-susceptibility mapping in the Amahata river basin, Japan. Ann Assoc Am Geogr 75(1):102–114
Ashraf MAM, Yusoh R, Sazalil MA, Abidin MHZ (2018) Aquifer Characterization and groundwater potential evaluation in sedimentary rock formation. In Journal of Physics: Conference Series, vol 995, No. 1. IOP Publishing, Bristol, p 012106
Beucher A, Møller AB, Greve MH (2017) Artificial neural networks and decision tree classification for predicting soil drainage classes in Denmark. Geoderma 320:30–42
Breiman L (1996) Bagging predictors. Mach Learn 24:123–40
Breiman L (2001) Random forests. Mach Learn 45:5–32
Chatterjee S, Hadi AS, Price B (2000) Regression analysis by example (3rd ed.). Wiley, Hoboken. ISBN 978-0-471-31946-7
Chen W, Yeo CK, Lau CT, Lee BS (2015) Real-time twitter content polluter detection based on direct features. In 2015 2nd International Conference on Information Science and Security (ICISS). IEEE, Piscataway, pp 1–4
Chen W, Li H, Hou E, Wang S, Wang G, Panahi M, Li T, Peng T, Guo C, Niu C, Xiao L, Wang J, Xie X, Ahmad BB (2018) GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models. Sci Total Environ 634:853–67
Chowdhury A, Jha MK, Chowdary VM (2010) Delineation of groundwater recharge zones and identification of artificial recharge sites in West Medinipur district, West Bengal, using RS, GIS and MCDM techniques. Environ Earth Sci 59(6):1209
Conrad O, Olaya V (2012) SAGA-GIS module library documentation (v2. 2.3). Module Valley Depth. Available online: http://www.sagagis.org/saga_tool_doc/2.2.3/index.html
Das S (2019) Comparison among influencing factor, frequency ratio, and analytical hierarchy process techniques for groundwater potential zonation in Vaitarna basin, Maharashtra, India. Groundw Sustain Dev 8:617–29
Decker K, Heinrich M, Klein P, Kociu A, Lipiarski P, Pirkl H, Rank D, Wimmer H (1998) Karst springs, groundwater and surface runoff in the calcareous Alps: assessing quality and reliance of long-term water supply. IAHS Publ Ser Proc Rep Intern Assoc Hydrol Sci 248:149–156
Duan H, Deng Z, Deng F, Wang D (2016) Assessment of groundwater potential based on multicriteria decision making model and decision tree algorithms. Math Probl Eng. https://doi.org/10.1155/2016/2064575
Feng C, Cui M, Hodge BM, Zhang J (2017) A data-driven multi-model methodology with deep feature selection for short-term wind forecasting. Appl Energy 190:1245–1257
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139
Gebre T, Ahmad I, Dar MA, Gadissa E, Teka AH, Tolosa AT, Brhane ES (2018) Mapping of groundwater potential zones using remote sensing and geographic information system: A case study of parts of Tigray, Ethiopia. Environ Geosci 25:133–40
Gnanachandrasamy G, Zhou Y, Bagyaraj M, Venkatramanan S, Ramkumar T, Wang S (2018) Remote sensing and GIS based groundwater potential zone mapping in Ariyalur District, Tamil Nadu. J Geol Soc India 92:484–490
Hassan ZU, Kanth TA, Malik MI (2018) Groundwater potential zonation and prioritization of wular catchment of Kashmir using GIS based multi-criteria evaluation approach. Water Energy Int 60RNI:49–61
Hastie TJ, Tibshirani RJ (2017) Generalized additive models. CRC Press, Boca Raton
Ho TK (1995) Random decision forests C3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. IEEE Computer Society, Washington, D.C., pp 278–82
Hofner B, Mayr A, Schmid M (2016) GamboostLSS: An R package for model building and variable selection in the GAMLSS framework. J Stat Softw 74(1):1–31
Johnson LE, Olsen BG (1998) Assessment of quantitative precipitation forecasts. Weather Forecast 13(1):75–83
Kalantar B, Pradhan B, Naghibi SA, Motevalli A, Mansor S (2018) Assessment of the effects of training data selection on the landslide susceptibility mapping: a comparison between support vector machine (SVM), logistic regression (LR) and artificial neural networks (ANN). Geomatics Nat Hazards Risk 9(1):49–69
Kordestani MD, Naghibi SA, Hashemi H, Ahmadi K, Kalantar B, Pradhan B (2019) Groundwater potential mapping using a novel data-mining ensemble model. Hydrogeol J 27:211–224
Kuhn M (2015) Caret: classification and regression training. Astrophysics Source Code Library. http://adsabs.harvard.edu/abs/2015ascl.soft05003K
Kuhn M, Johnson K (2013) Applied predictive modeling, vol 26. Springer, New York
Lee S, Hong SM, Jung HS (2018) GIS-based groundwater potential mapping using artificial neural network and support vector machine models: the case of Boryeong city in Korea. Geocarto Int 33(8):847–861
Lemmens A, Croux C (2006) Bagging and boosting classification trees to predict churn. J Mark Res 43(2):276–286
Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22
Manap AM, Sulaiman WN, Ramli MF, Pradhan B, Surip N (2013) A knowledge-driven GIS modeling technique for groundwater potential mapping at the Upper Langat Basin, Malaysia. Arab J Geosci 6(5):1621–1637
Mayr A, Fenske N, Hofner B, Kneib T, Schmid M (2012) Generalized additive models for location, scale and shape for high dimensional data-a flexible approach based on boosting. J R Stat Soc Ser C Appl Stat 61:403–27
Miraki S, Zanganeh SH, Chapi K, Singh VP, Shirzadi A, Shahabi H, Pham BT (2019) Mapping groundwater potential using a novel hybrid intelligence approach. Water Resour Manag 33(1):281–302
Monserud RA, Leemans R (1992) Comparing global vegetation maps with the Kappa statistic. Ecol Model 62(4):275–293
Motevalli A, Naghibi SA, Hashemi H, Berndtsson R, Pradhan B, Gholami V (2019) Inverse method using boosted regression tree and k-nearest neighbor to quantify effects of point and non-point source nitrate pollution in groundwater. J Clean Prod 228:1248–1263
Murphree DH, Arabmakki E, Ngufor C, Storlie CB, McCoy RG (2018) Stacked classifiers for individualized prediction of glycemic control following initiation of metformin therapy in type 2 diabetes. Comput Biol Med 103:109–115
Naghibi SA, Dolatkordestani M, Rezaei A, Amouzegari P, Heravi MT, Kalantar B, Pradhan B (2019) Application of rotation forest with decision trees as base classifier and a novel ensemble model in spatial modeling of groundwater potential. Environ Monit Assess 191(4):248
Nampak H, Pradhan B, Manap MA (2014) Application of GIS based data driven evidential belief function model to predict groundwater potential zonation. J Hydrol 513:283–300
Prasad RK, Mondal NC, Banerjee P, Nandakumar MV, Singh VS (2008) Deciphering potential groundwater zone in hard rock through the application of GIS. Environ Geol 55(3):467–475
Quinlan JR (1996) Bagging, boosting, and C4. 5. AAAI/IAAI 1:725–730
Sachdeva S, Kumar B (2020) A comparative study between frequency ratio model and gradient boosted decision trees with greedy dimensionality reduction in groundwater potential assessment. Water Resour Manag. https://doi.org/10.1007/s11269-020-02677-3
Sameen MI, Pradhan B, Lee S (2019) Self-learning random forests model for mapping groundwater yield in data-scarce areas. Nat Resour Res 28:757–775
Sandman A, Isaeus M, Bergström U, Kautsky H (2008) Spatial predictions of Baltic phytobenthic communities: Measuring robustness of generalized additive models based on transect data. J Mar Syst 74:S86–S96
Sidle RC, Ochiai H (2006) Landslides: Processes, prediction, and land use. Water Resources Monogr 18. American Geophysical Union, Washington, D.C
Songara JC, Joshipura NM, Mehmood K, Prakash I (2015a) Assessment and management of watershed of Machhu Dam III, Morbi, Gujarat using geoinformatics technology. Int J Adv Eng Res Dev
Songara JC, Kadivar HT, Joshipura NM, Prakash I (2015b) Estimation of surface runoff of Machhu Dam III Chatchment Area, Morbi, Gujarat, India, using curve number method and GIS. Int J Sci Res Dev 3(3):2038–2043
Stanski HR, Wilson LJ, Burrows WR (1989) Survey of common verification methods in meteorology. World Weather Watch Technical Report No. 8, TD No. 358, World Meteorological Organization, Geneva, 114 pp
Thuiller W, Lafourcade B (2009) BIOMOD: species/climate modelling functions. R Package Version 1.1-3/r118
Wang S, Chen S (2019) Insights to fracture stimulation design in unconventional reservoirs based on machine learning modeling. J Petrol Sci Eng 174:682–695
Acknowledgements
We thank the support of the Alexander von Humboldt Foundation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest/Competing interests
Not applicable.
Code Availability
Not applicable.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mosavi, A., Sajedi Hosseini, F., Choubin, B. et al. Ensemble Boosting and Bagging Based Machine Learning Models for Groundwater Potential Prediction. Water Resour Manage 35, 23–37 (2021). https://doi.org/10.1007/s11269-020-02704-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11269-020-02704-3