Abstract
In this paper, a support vector machine (SVM) model was developed to predict nitrate concentration in groundwater of Arak plain, Iran. The model provided a tool for prediction of nitrate concentration using a set of easily measurable groundwater quality variables including water temperature, electrical conductivity, groundwater depth, total dissolved solids, dissolved oxygen, pH, land use, and season of the year as input variables. The data set comprised of 160 water samples representing 40 different wells monitored for 1 year. The associated parameters for the optimum SVM model were obtained using a combination of 4-fold cross-validation and grid search technique. The optimum model was used to predict nitrate concentration in Arak plain aquifer. The SVM model predicted nitrate concentration in training and test stage data sets with reasonably high correlation (0.92 and 0.87, respectively) with the measured values and low root mean squared errors of 0.086 and 0.111, respectively. Finally, the map of nitrate concentration in groundwater was prepared for all four seasons using the trained SVM model and a geographic information system (GIS) interpolation scheme and compared with the results with a physics-based (flow and contaminant) model. Overall, the results showed that SVM model could be used as a fast, reliable, and cost-effective method for assessment and predicting groundwater quality.
Similar content being viewed by others
References
Khalil, A., Almasri, M. N., McKee, M., & Kaluarachchi, J. J. (2005). Applicability of statistical learning algorithms in groundwater quality modeling. Water Resources Research, 41, W05010. doi:10.1029/2004WR003608.
Yoon, H., Jun, J., Hyun, Y., Bae, G., & Lee, K. (2011). Comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer. Journal of Hydrology, 396, 128–138.
Babiker, I. S., Mohamed, M. A. A., Terao, H., Kato, K., & Ohta, K. (2003). Assessment of groundwater contamination by nitrate leaching from intensive vegetable cultivation using geographical information system. Environment International, 29, 1009–1017.
Thirumalaivasan, D., Karmegam, M., & Venugopal, K. (2003). AHP-DRASTIC: software for specific aquifer vulnerability assessment using DRASTIC model and GIS. Environmental Modelling and Software, 18(7), 645–656.
Kalivarapu, V., & Winer, E. (2008). A multi-fidelity software framework for interactive modeling of advective and diffusive contaminant transport in groundwater. Environmental Modeling and Software, 23(12), 1370–1383.
Chesnaux, R., & Allen, D. M. (2008). Simulating nitrate leaching profiles in a highly permeable vadose zone. Environmental Modeling and Assessment, 13, 527–539.
Tutmez, B., & Hatipoglu, Z. (2010). Comparing two data driven interpolation methods for modeling nitrate distribution in aquifer. Ecological Informatics, 5, 311–315.
Almasri, M. N., & Kaluarachchi, J. J. (2007). Modular neural networks to predict the nitrate distribution in ground water using the on-ground nitrogen loading and recharge data. Journal of Hydrology, 343, 211–229.
Almasri, M. N., & Kaluarachchi, J. J. (2004). Implications of on-ground nitrogen loading and soil transformations on ground water quality management. Journal of the American Water Resources Association, 40, 165–186.
Schnobrich, M. R., Chaplin, B. P., Semmens, M. J., & Novak, P. J. (2007). Stimulating hydrogenotrophic denitrification in simulated groundwater containing high dissolved oxygen and nitrate concentrations. Water Research, 41(9), 1869–1876.
Gardner, K. K., & Vogel, R. M. (2005). Predicting ground water nitrate concentration from land Use. Ground Water, 43(3), 343–352.
USEPA (U.S. Environmental Protection Agency). (2009). Edition of the Drinking Water Standards and Health Advisories. EPA 822-R-09-011, Office of Water, Washington, USA.
Wagner, B. J. (1992). Simultaneous parameter estimation and contaminant source characterization for couples groundwater flow and contaminant transport modeling. J. Hydrology, 135, 275–303.
Hassan, A., & Hamed, K. H. (2001). Prediction of plume migration in heterogeneous media using artificial neural networks. Water Resources Research, 37(3), 605–623.
Kunstmann, H., Kinzelbach, W., & Siegfried, T. (2002). Conditional first order second moment method and its application to the quantification of uncertainty in groundwater modeling. Water Resources Research, 38(4), 1035. doi:10.1029/2000WR000022.
Liu, S., Tucker, P., & Mansell, M. (2010). A conceptual nitrate transport model and its application at different scales. Environmental Modeling and Assessment, 15, 251–259.
Almasri, M. N., & Kaluarachchi, J. J. (2005). Multi-criteria decision analysis for the optimal management of nitrate contamination of aquifers. Journal of Environmental Management, 74, 365–381.
Dixon, B. (2009). A case study using support vector machines, neural networks and logistic regression in a GIS to identify wells contaminated with nitrate-N. Hydrogeology Journal, 17, 1507–1520.
Maier, H. R., Jain, A., Dandy, G. C., & Sudheer, K. P. (2010). Methods used for the development of neural networks for the prediction of water resource variables in river systems: current status and future directions. Environmental Modelling & Software, 25, 891–909.
Liu, J.P., Chang, M.Q., Ma, X.Y. (2009). Groundwater quality assessment based on support vector machine. HAIHE River Basin Research and Planning Approach-Proceedings of 2009 International Symposium of HAIHE Basin Integrated Water and Environment Management, Beijing, China. 2009, 173-178.
Vapnik, V. N. (1998). Statistical learning theory. New York: John Wiley.
Dibike, Y. B., Velickov, S., Solomatine, D. P., & Abbott, M. B. (2001). Model induction with support vector machines: introduction and application. ASCE Journal of Computing in Civil Engineering, 15(3), 208–216.
Liong, S. Y., & Sivapragasam, C. (2002). Flood stage forecasting with support vector machines. Journal of American Water Resources Association, 38(1), 173–186.
Asefa, T., Kemblowski, M., McKee, M., & Khalil, A. (2006). Multi-time scale stream flow prediction: the support vector machines approach. Journal of Hydrology, 318, 7–16.
Asefa, T., Kemblowski, M., Urroz, G., McKee, M., & Khalil, A. (2005). Support vector machines (SVMs) for monitoring networks design. Ground Water, 43(4), 413–422.
Behzad, M., Asghari, K., Eazi, M., & Palhang, M. (2009). Generalization performance of support vector machines and neural networks in runoff modeling. Expert System With Applications, 36, 7624–7629.
Asefa, T., Kemblowski, M. W., Urroz, G., McKee, M., & Khalil, A. (2004). Support vector-based ground water head observation networks design. Water Resources Research, 40(11), W11509.
Behzad, M., Asghari, K., & Coppola, E. (2010). Comparative study of SVMs and ANNs in aquifer water level prediction. ASCE Journal of Computing in civil Engineering, 24(5), 408–413.
Liao, Y., Xu, J., & Wang, W. (2011). A method of water quality assessment based on biomonitoring and multiclass support vector machine. Procedia Environmental Sciences, 10, 451–457.
Singh, K. P., Basant, N., & Gupta, S. (2011). Support vector machines in water quality management. Analytica Chimica Acta, 703, 152–162.
Khader, A. I., & McKee, M. (2014). Use of a relevance vector machine for groundwater quality monitoring network design under uncertainty. Environmental Modelling & Software, 57, 115–126.
Smola, A. J., & Scholkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14, 199–222.
Chang, C., Lin, L. (2011). LIBSVM—a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Byun, H., Lee, S. W. (2002). Application of support vector machines for pattern recognition: a survey. Pattern recognition with support vector machines. First international workshop, Niagara falls, Canada.
Noori, R., Karbassi, A. R., Moghaddamnia, K., Han, D., Zokaei-Ashtiani, M. H., Farokhnia, A., & Ghafari Gousheh, N. (2011). Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. Journal of Hydrology, 401, 177–189.
Ustun, B., Melssen, W. J., Oudenhuijzen, M., & Buydens, L. M. C. (2005). Determination of optimal support vector regression parameters by genetic algorithms and simplex optimization. Analytica Chimica Acta, 544, 292–305.
Hutchinson, M. F. (1996). Third international conference. Workshop on: Integrating GIS and Environmental Modeling, Santa Barbara, CA. A locally adaptive approach to the interpolation of digital elevation models.
Lutsa, J., Ojedaa, F., Van de Plasa, R., De Moora, B., Van Huffela, S., & Suykensa, J. A. K. (2010). A tutorial on support vector machine-based methods for classification problems in chemometrics. Analytica Chimica Acta, 665(2), 129–145.
Basak, D., Pal, S., & Patranabis, D. C. (2007). Support vector regression. Neural Information Processing – Letters and Reviews, 11(10), 203–224.
Vapnik, V. N. (1999). The nature of statistical learning theory (2nd ed.). Berlin: Springer.
Cherkassky, V., & Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks, 17, 113–126.
Author information
Authors and Affiliations
Corresponding author
Appendix A. (SVM Model Background)
Appendix A. (SVM Model Background)
Originally developed for binary classification problems, SVMs make use of the hyper-planes to define decision boundaries between the data points of different classes [38]. Then, with the introduction of ε-insensitive loss function, SVM has been extended to solve the regression problems [30]. SVM methods have been mainly employed for regression estimation, so-called support vector regression (SVR) [39]. They were developed from linear classification into nonlinear regression. Nonlinear SVR is based on the concept of mapping data onto high-dimensional feature space through nonlinear mapping (kernel function) and proceeding with linear regression in this space. Suppose the training data set have been taken as “m” vectors {x i, y i}, i = 1,…,m where x i∈R n is the ith input vector and y i∈R is its corresponded output. In ε-SVR, which is used in this paper, the aim of learning process is to find a function f(x) as an approximation of the value y(x) that has at most ε deviation from the actually obtained targets y i for all the training data and at the same time as flat as possible [32, 39]. The objective function of SVM is to minimize the structure risk, which minimizes the empirical error and a regularized term that is called regularized risk function. Also, some error of estimation is taken into account by introducing slack variables ξ and ξ*, as well as the penalty parameter C. The corresponding problem can be equivalent to the following convex constrained quadratic optimization problem:
To obtain
where w = {w 1 w 2 … w m} are the SVM weights, ϕ is a kernel function that map input vectors, X = {x 1 x 2 … x m}, into a higher dimensional feature space, 〈w, ϕ〉 denotes the dot product between w and ϕ(x), and b is bias. ‖w‖2 is the regularization term which minimizes the complexity of the function f(x) (i.e., the estimated function will always tend to be flat, avoiding over fitting). The second term represents the ε-insensitive loss function depicted in Fig. A1. C >0 is a user-defined constant which determines the trade-off between the flatness of f(x) and the amount up to which deviations larger than ε are tolerated. The ε-insensitive loss function was defined by Vapnik [40] as
Fig. 8
Usually, Eq. A1 is solved in its dual form using Lagrange multipliers. Transforming this quadratic programming problem to its corresponding dual optimization problem and introducing the kernel function in order to achieve the nonlinearity yields the optimal regression function as [40, 41]
where the Lagrange multipliers αi and αi* are required to be greater than zero for i = 1,…, m, and K(x i, x) is a kernel function defined as an inner product in the feature space as follows:
As a result, the input vectors that correspond to nonzero Lagrangian multipliers, αi and αi*, are considered as the support vectors. The SVM model thus is formulated based on these vectors and is guaranteed to have a global, unique, and sparse solution [1].
Rights and permissions
About this article
Cite this article
Arabgol, R., Sartaj, M. & Asghari, K. Predicting Nitrate Concentration and Its Spatial Distribution in Groundwater Resources Using Support Vector Machines (SVMs) Model. Environ Model Assess 21, 71–82 (2016). https://doi.org/10.1007/s10666-015-9468-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10666-015-9468-0