Predicting Nitrate Concentration and Its Spatial Distribution in Groundwater Resources Using Support Vector Machines (SVMs) Model

Arabgol, Raheleh; Sartaj, Majid; Asghari, Keyvan

doi:10.1007/s10666-015-9468-0

Predicting Nitrate Concentration and Its Spatial Distribution in Groundwater Resources Using Support Vector Machines (SVMs) Model

Published: 05 June 2015

Volume 21, pages 71–82, (2016)
Cite this article

Environmental Modeling & Assessment Aims and scope Submit manuscript

Raheleh Arabgol¹,
Majid Sartaj¹ &
Keyvan Asghari²

1209 Accesses
68 Citations
Explore all metrics

Abstract

In this paper, a support vector machine (SVM) model was developed to predict nitrate concentration in groundwater of Arak plain, Iran. The model provided a tool for prediction of nitrate concentration using a set of easily measurable groundwater quality variables including water temperature, electrical conductivity, groundwater depth, total dissolved solids, dissolved oxygen, pH, land use, and season of the year as input variables. The data set comprised of 160 water samples representing 40 different wells monitored for 1 year. The associated parameters for the optimum SVM model were obtained using a combination of 4-fold cross-validation and grid search technique. The optimum model was used to predict nitrate concentration in Arak plain aquifer. The SVM model predicted nitrate concentration in training and test stage data sets with reasonably high correlation (0.92 and 0.87, respectively) with the measured values and low root mean squared errors of 0.086 and 0.111, respectively. Finally, the map of nitrate concentration in groundwater was prepared for all four seasons using the trained SVM model and a geographic information system (GIS) interpolation scheme and compared with the results with a physics-based (flow and contaminant) model. Overall, the results showed that SVM model could be used as a fast, reliable, and cost-effective method for assessment and predicting groundwater quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Water quality prediction using machine learning models based on grid search method

Article Open access 29 September 2023

Surface water quality index forecasting using multivariate complementing approach reinforced with locally weighted linear regression model

Article 23 April 2024

Machine learning for geochemical exploration: classifying metallogenic fertility in arc magmas and insights into porphyry copper deposit formation

Article Open access 24 January 2022

References

Khalil, A., Almasri, M. N., McKee, M., & Kaluarachchi, J. J. (2005). Applicability of statistical learning algorithms in groundwater quality modeling. Water Resources Research, 41, W05010. doi:10.1029/2004WR003608.
Google Scholar
Yoon, H., Jun, J., Hyun, Y., Bae, G., & Lee, K. (2011). Comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer. Journal of Hydrology, 396, 128–138.
Article Google Scholar
Babiker, I. S., Mohamed, M. A. A., Terao, H., Kato, K., & Ohta, K. (2003). Assessment of groundwater contamination by nitrate leaching from intensive vegetable cultivation using geographical information system. Environment International, 29, 1009–1017.
Article Google Scholar
Thirumalaivasan, D., Karmegam, M., & Venugopal, K. (2003). AHP-DRASTIC: software for specific aquifer vulnerability assessment using DRASTIC model and GIS. Environmental Modelling and Software, 18(7), 645–656.
Article Google Scholar
Kalivarapu, V., & Winer, E. (2008). A multi-fidelity software framework for interactive modeling of advective and diffusive contaminant transport in groundwater. Environmental Modeling and Software, 23(12), 1370–1383.
Article Google Scholar
Chesnaux, R., & Allen, D. M. (2008). Simulating nitrate leaching profiles in a highly permeable vadose zone. Environmental Modeling and Assessment, 13, 527–539.
Article Google Scholar
Tutmez, B., & Hatipoglu, Z. (2010). Comparing two data driven interpolation methods for modeling nitrate distribution in aquifer. Ecological Informatics, 5, 311–315.
Article Google Scholar
Almasri, M. N., & Kaluarachchi, J. J. (2007). Modular neural networks to predict the nitrate distribution in ground water using the on-ground nitrogen loading and recharge data. Journal of Hydrology, 343, 211–229.
Article CAS Google Scholar
Almasri, M. N., & Kaluarachchi, J. J. (2004). Implications of on-ground nitrogen loading and soil transformations on ground water quality management. Journal of the American Water Resources Association, 40, 165–186.
Article CAS Google Scholar
Schnobrich, M. R., Chaplin, B. P., Semmens, M. J., & Novak, P. J. (2007). Stimulating hydrogenotrophic denitrification in simulated groundwater containing high dissolved oxygen and nitrate concentrations. Water Research, 41(9), 1869–1876.
Article CAS Google Scholar
Gardner, K. K., & Vogel, R. M. (2005). Predicting ground water nitrate concentration from land Use. Ground Water, 43(3), 343–352.
Article CAS Google Scholar
USEPA (U.S. Environmental Protection Agency). (2009). Edition of the Drinking Water Standards and Health Advisories. EPA 822-R-09-011, Office of Water, Washington, USA.
Wagner, B. J. (1992). Simultaneous parameter estimation and contaminant source characterization for couples groundwater flow and contaminant transport modeling. J. Hydrology, 135, 275–303.
Article CAS Google Scholar
Hassan, A., & Hamed, K. H. (2001). Prediction of plume migration in heterogeneous media using artificial neural networks. Water Resources Research, 37(3), 605–623.
Article Google Scholar
Kunstmann, H., Kinzelbach, W., & Siegfried, T. (2002). Conditional first order second moment method and its application to the quantification of uncertainty in groundwater modeling. Water Resources Research, 38(4), 1035. doi:10.1029/2000WR000022.
Article Google Scholar
Liu, S., Tucker, P., & Mansell, M. (2010). A conceptual nitrate transport model and its application at different scales. Environmental Modeling and Assessment, 15, 251–259.
Article Google Scholar
Almasri, M. N., & Kaluarachchi, J. J. (2005). Multi-criteria decision analysis for the optimal management of nitrate contamination of aquifers. Journal of Environmental Management, 74, 365–381.
Article CAS Google Scholar
Dixon, B. (2009). A case study using support vector machines, neural networks and logistic regression in a GIS to identify wells contaminated with nitrate-N. Hydrogeology Journal, 17, 1507–1520.
Article CAS Google Scholar
Maier, H. R., Jain, A., Dandy, G. C., & Sudheer, K. P. (2010). Methods used for the development of neural networks for the prediction of water resource variables in river systems: current status and future directions. Environmental Modelling & Software, 25, 891–909.
Article Google Scholar
Liu, J.P., Chang, M.Q., Ma, X.Y. (2009). Groundwater quality assessment based on support vector machine. HAIHE River Basin Research and Planning Approach-Proceedings of 2009 International Symposium of HAIHE Basin Integrated Water and Environment Management, Beijing, China. 2009, 173-178.
Vapnik, V. N. (1998). Statistical learning theory. New York: John Wiley.
Google Scholar
Dibike, Y. B., Velickov, S., Solomatine, D. P., & Abbott, M. B. (2001). Model induction with support vector machines: introduction and application. ASCE Journal of Computing in Civil Engineering, 15(3), 208–216.
Article Google Scholar
Liong, S. Y., & Sivapragasam, C. (2002). Flood stage forecasting with support vector machines. Journal of American Water Resources Association, 38(1), 173–186.
Article Google Scholar
Asefa, T., Kemblowski, M., McKee, M., & Khalil, A. (2006). Multi-time scale stream flow prediction: the support vector machines approach. Journal of Hydrology, 318, 7–16.
Article Google Scholar
Asefa, T., Kemblowski, M., Urroz, G., McKee, M., & Khalil, A. (2005). Support vector machines (SVMs) for monitoring networks design. Ground Water, 43(4), 413–422.
Article CAS Google Scholar
Behzad, M., Asghari, K., Eazi, M., & Palhang, M. (2009). Generalization performance of support vector machines and neural networks in runoff modeling. Expert System With Applications, 36, 7624–7629.
Article Google Scholar
Asefa, T., Kemblowski, M. W., Urroz, G., McKee, M., & Khalil, A. (2004). Support vector-based ground water head observation networks design. Water Resources Research, 40(11), W11509.
Article Google Scholar
Behzad, M., Asghari, K., & Coppola, E. (2010). Comparative study of SVMs and ANNs in aquifer water level prediction. ASCE Journal of Computing in civil Engineering, 24(5), 408–413.
Article Google Scholar
Liao, Y., Xu, J., & Wang, W. (2011). A method of water quality assessment based on biomonitoring and multiclass support vector machine. Procedia Environmental Sciences, 10, 451–457.
Article CAS Google Scholar
Singh, K. P., Basant, N., & Gupta, S. (2011). Support vector machines in water quality management. Analytica Chimica Acta, 703, 152–162.
Article CAS Google Scholar
Khader, A. I., & McKee, M. (2014). Use of a relevance vector machine for groundwater quality monitoring network design under uncertainty. Environmental Modelling & Software, 57, 115–126.
Article Google Scholar
Smola, A. J., & Scholkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14, 199–222.
Article Google Scholar
Chang, C., Lin, L. (2011). LIBSVM—a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Byun, H., Lee, S. W. (2002). Application of support vector machines for pattern recognition: a survey. Pattern recognition with support vector machines. First international workshop, Niagara falls, Canada.
Noori, R., Karbassi, A. R., Moghaddamnia, K., Han, D., Zokaei-Ashtiani, M. H., Farokhnia, A., & Ghafari Gousheh, N. (2011). Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. Journal of Hydrology, 401, 177–189.
Article Google Scholar
Ustun, B., Melssen, W. J., Oudenhuijzen, M., & Buydens, L. M. C. (2005). Determination of optimal support vector regression parameters by genetic algorithms and simplex optimization. Analytica Chimica Acta, 544, 292–305.
Article Google Scholar
Hutchinson, M. F. (1996). Third international conference. Workshop on: Integrating GIS and Environmental Modeling, Santa Barbara, CA. A locally adaptive approach to the interpolation of digital elevation models.
Google Scholar
Lutsa, J., Ojedaa, F., Van de Plasa, R., De Moora, B., Van Huffela, S., & Suykensa, J. A. K. (2010). A tutorial on support vector machine-based methods for classification problems in chemometrics. Analytica Chimica Acta, 665(2), 129–145.
Article Google Scholar
Basak, D., Pal, S., & Patranabis, D. C. (2007). Support vector regression. Neural Information Processing – Letters and Reviews, 11(10), 203–224.
Google Scholar
Vapnik, V. N. (1999). The nature of statistical learning theory (2nd ed.). Berlin: Springer.
Google Scholar
Cherkassky, V., & Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks, 17, 113–126.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Civil Engineering, University of Ottawa, 161 Louis Pasteur, Ottawa, ON, Canada, K1N 6 N5
Raheleh Arabgol & Majid Sartaj
Department of Civil Engineering, Isfahan University of Technology, 8415683111, Isfahan, Iran
Keyvan Asghari

Authors

Raheleh Arabgol
View author publications
You can also search for this author in PubMed Google Scholar
Majid Sartaj
View author publications
You can also search for this author in PubMed Google Scholar
Keyvan Asghari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Majid Sartaj.

Appendix A. (SVM Model Background)

Originally developed for binary classification problems, SVMs make use of the hyper-planes to define decision boundaries between the data points of different classes [38]. Then, with the introduction of ε-insensitive loss function, SVM has been extended to solve the regression problems [30]. SVM methods have been mainly employed for regression estimation, so-called support vector regression (SVR) [39]. They were developed from linear classification into nonlinear regression. Nonlinear SVR is based on the concept of mapping data onto high-dimensional feature space through nonlinear mapping (kernel function) and proceeding with linear regression in this space. Suppose the training data set have been taken as “m” vectors {x _i, y _i}, i = 1,…,m where x _i∈R ⁿ is the ith input vector and y _i∈R is its corresponded output. In ε-SVR, which is used in this paper, the aim of learning process is to find a function f(x) as an approximation of the value y(x) that has at most ε deviation from the actually obtained targets y _i for all the training data and at the same time as flat as possible [32, 39]. The objective function of SVM is to minimize the structure risk, which minimizes the empirical error and a regularized term that is called regularized risk function. Also, some error of estimation is taken into account by introducing slack variables ξ and ξ*, as well as the penalty parameter C. The corresponding problem can be equivalent to the following convex constrained quadratic optimization problem:

$$ \begin{array}{cc}\hfill \min imize\kern0.5em {R}_{reg}\left[f\right]=\frac{1}{2}\left\Vert w\left\Vert {}^2+C{\displaystyle \sum_{i=1}^m\left({\xi}_i+{\xi}_i^{\ast}\right),}\right.\right.\hfill & \hfill subject\kern0.5em to\left[\begin{array}{c}\hfill w\cdot \phi \left({x}_i\right)+b-{y}_i\le \varepsilon +{\xi}_i\hfill \\ {}\hfill {y}_i-w\cdot \phi \left({x}_i\right)-b\le \varepsilon +{\xi}_i^{\ast}\hfill \\ {}\hfill {\xi}_i^{\ast },{\xi}_i\ge 0,\kern0.5em i=1,\dots m\hfill \end{array}\right.\hfill \end{array} $$

(A1)

To obtain

$$ \begin{array}{cc}\hfill f(x)={\displaystyle \sum_{i-1}^m\left\langle {w}_i,{\phi}_i\left.(x)\right\rangle +b\right.}\hfill & \hfill \kern0.9em with\hfill \end{array}\kern0.9em W\in {R}^n,b\in R $$

(A2)

where w = {w ₁ w ₂ … w _m} are the SVM weights, ϕ is a kernel function that map input vectors, X = {x ₁ x ₂ … x _m}, into a higher dimensional feature space, 〈w, ϕ〉 denotes the dot product between w and ϕ(x), and b is bias. ‖w‖² is the regularization term which minimizes the complexity of the function f(x) (i.e., the estimated function will always tend to be flat, avoiding over fitting). The second term represents the ε-insensitive loss function depicted in Fig. A1. C >0 is a user-defined constant which determines the trade-off between the flatness of f(x) and the amount up to which deviations larger than ε are tolerated. The ε-insensitive loss function was defined by Vapnik [40] as

Fig. 8

$$ {\left|\xi \right|}_{\varepsilon }={\left|y-f(x)\right|}_{\varepsilon }=\left\{\begin{array}{c}\hfill 0\hfill \\ {}\hfill \left|y-f(x)\right|-\varepsilon \hfill \end{array}\begin{array}{c}\hfill if\left|y-f(X)\right|\le \varepsilon \hfill \\ {}\hfill otherwise\hfill \end{array}\right. $$

(A3)

Usually, Eq. A1 is solved in its dual form using Lagrange multipliers. Transforming this quadratic programming problem to its corresponding dual optimization problem and introducing the kernel function in order to achieve the nonlinearity yields the optimal regression function as [40, 41]

$$ f(X)={\displaystyle \sum_{i=1}^m\left({a}_i^{\ast }-{a}_i\right)K\left({x}_i,x\right)+b} $$

(A4)

where the Lagrange multipliers α_i and α_i* are required to be greater than zero for i = 1,…, m, and K(x _i, x) is a kernel function defined as an inner product in the feature space as follows:

$$ K\left({x}_i,x\right)={\displaystyle \sum_{i-1}^m\varphi {\left({x}_i\right)}_{\cdot}\varphi (x)} $$

(A5)

As a result, the input vectors that correspond to nonzero Lagrangian multipliers, α_i and α_i*, are considered as the support vectors. The SVM model thus is formulated based on these vectors and is guaranteed to have a global, unique, and sparse solution [1].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arabgol, R., Sartaj, M. & Asghari, K. Predicting Nitrate Concentration and Its Spatial Distribution in Groundwater Resources Using Support Vector Machines (SVMs) Model. Environ Model Assess 21, 71–82 (2016). https://doi.org/10.1007/s10666-015-9468-0

Download citation

Received: 25 April 2014
Accepted: 26 May 2015
Published: 05 June 2015
Issue Date: January 2016
DOI: https://doi.org/10.1007/s10666-015-9468-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting Nitrate Concentration and Its Spatial Distribution in Groundwater Resources Using Support Vector Machines (SVMs) Model

Abstract

Access this article

Similar content being viewed by others

Water quality prediction using machine learning models based on grid search method

Surface water quality index forecasting using multivariate complementing approach reinforced with locally weighted linear regression model

Machine learning for geochemical exploration: classifying metallogenic fertility in arc magmas and insights into porphyry copper deposit formation

References

Author information

Authors and Affiliations

Corresponding author

Appendix A. (SVM Model Background)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting Nitrate Concentration and Its Spatial Distribution in Groundwater Resources Using Support Vector Machines (SVMs) Model

Abstract

Access this article

Similar content being viewed by others

Water quality prediction using machine learning models based on grid search method

Surface water quality index forecasting using multivariate complementing approach reinforced with locally weighted linear regression model

Machine learning for geochemical exploration: classifying metallogenic fertility in arc magmas and insights into porphyry copper deposit formation

References

Author information

Authors and Affiliations

Corresponding author

Appendix A. (SVM Model Background)

Appendix A. (SVM Model Background)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation