Abstract
The accurate evaluation of river quality assessment is essential for human health, ecosystem functionality, economic growth, and future population growth. In most cases, river quality assessment practices use the Water Quality Index (WQI) to assess WQI values of the river and multivariate statistics for analyzing multiple chemical and physical variables within the river. However, due to huge data collection, difficulties in data handling, complicated and uncertain physical, chemical, and biological on water quality parameter values, need to a different approach to classify the river quality. Therefore, this study offers different techniques and comparative studies in finding optimal strategies for river quality assessment using two major Artificial Intelligence (AI) algorithms which are Machine learning (ML) and Deep Learning (DL). Before finding the optimal strategies, this study proposes different preprocessing techniques combined with the dimensional reductions to find optimal model fit with less feature imbalance. The ML algorithms include both unsupervised learning and supervised learning. The unsupervised learnings are Hierarchical Clustering (HC), and K-Means (KM) whereas ten supervised learnings are K-Nearest Neighbors (KNN), Logistic Regression (LR), Support Vector Machine (SVM), Lasso Regression (LAR), Ridge Regression (RR), Linear Discriminant Analysis (LDA), Naïve-Bayes (NB), Decision Tree (DT), and K-Means (KM). This study also includes two DL models which are Deep Learning Neural Network (DLNN) and Multi-Layer Perceptron (MLP). Besides, this paper also offers different ways of tuning processes to improve the algorithms’ accuracies. Results show that HC able to divide polluted area into five different levels of water pollutions and KM suggest the optimal number of clusters. Whereas ten different supervised learning with two DL methods lists all the accurate and efficient results for the classification of river quality assessment. Thus, different techniques and models offers an alternative, able to handle huge data and different types of parameters to retrieve the accurate river quality assessment.
Similar content being viewed by others
Data availability
The datasets generated and/or analysed during the current study are not publicly available due to research data being confidential and belonging to the Department of Environment Malaysia but are available from the corresponding author on reasonable request.
References
Azrour M, Mabrouki J, Fattah G, Guezzaz A, Aziz A (2022) Machine learning algorithms for efficient water quality prediction. Model Earth Syst Environ 8:2793–2801
Bui DT, Khosravi K, Tiefenbacher J, Nguyen H, Kazakis N (2020) Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci Total Environ 721:137612
Chen B, Mu X, Chen P, Wang B, Choi J, Park H, Xu S, Wu Y, Yang H (2021) Machine learning-based inversion of water quality parameters in typical reach of the urban river by UAV multispectral data. Ecol Ind 133:108434
Chen K, Liu Q, Peng W, Liu X (2022) Source apportionment and natural background levels of major ions in shallow groundwater using multivariate statistical method: A case study in Huaibei Plain, China. J Environ Manage 301:113806
Chollet F (2015) Keras GitHub. Available at https://github.com/fchollet/keras
Chowdhury K, Akter A (2021) Water quality trend analysis in a citywide water distribution system. Water Sci Technol 84(10–11):3191–3210
Cui Y, Yan Z, Wang J, Hao S, Liu Y (2022) Deep learning–based remote sensing estimation of water transparency in shallow lakes by combining Landsat 8 and Sentinel 2 images. Environ Sci Pollut Res 29:4401–4413
de Oliveira TF, de Sousa Brandao IL, Mannaerts CM, Hauser-Davis RA, de Oliveira AAF, Saraiva ACF, de Oliveira MA, Ishihara JH (2020) Using hydrodynamic and water quality variables to assess eutrophication on a tropical hydroelectric reservoir. J Environ Manag 256:109932
Dehghani R, Poudeh HT, Izadi Z (2022) Dissolved oxygen concentration predictions for running waters with using hybrid machine learning techniques. Model Earth Syst Environ 8:2599–2613
Forghani M, Qian Y, Lee J, Farthing MW, Hesser T, Kitanidis PK, Darve EF (2021) Application of deep learning to large scale riverine flow velocity estimation. Stoch Env Res Risk Assess 35:1069–1088
Gazzaz NM, Yusoff MK, Aris AZ, Juahir H, Ramli MF (2012) Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. Mar Pollut Bull 64(11):2409–2420
Ha QK, Ngoc TDT, Vo PL, Nguyen HQ, Dang DH (2022) Groundwater in Southern Vietnam: Understanding geochemical processes to better preserve the critical water resource. Sci Total Environ 807:151345
Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90–95
Icke O, van Es DM, de Koning MF, Wuister JJG, Ng J, Phua KM, Koh YKK, Chan WJ, Tao G (2020) Performance improvement of wastewater treatment processes by application of machine learning. Water Sci Technol 82(12):2671–2680
Ighalo JO, Adeniyi AG, Marques G (2021) Artifcial intelligence for surface water quality monitoring and assessment: a systematic literature analysis. Model Earth Syst Environ 7:669–681
Javan K, Lialestani MRFH, Nejadhossein M (2015) A comparison of ANN and HSPF models for runoff simulation in Gharehsoo River watershed, Iran. Model Earth Syst Environ 1:41
Jiang W, Pokharel B, Lin L, Cao H, Carroll KC, Zhang Y, Galdeano C, Musale DA, Ghurye GL, Xu P (2021) Analysis and prediction of produced water quality and quality in the Permian Basin using machine learning techniques. Sci Total Environ 801:149693
Kamaruddin AF, Toriman ME, Juahir H, Zain SM, Rahman MNA, Kamaruddin MKA, Azid A (2015) Spatial characterization and identification sources of pollution using multivariate analysis at Terengganu River Basin, Malaysia. Jurnal Teknologi 77(1):269–273
Khullar S, Singh N (2022) Water quality assessment of a river using deep learning Bi-LSTM methodology: forecasting and validation. Environ Sci Pollut Res 29:12875–12889
Komer B, Bergstra J, Eliasmith C (2014) Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: Proceedings of the 13th Python in Science Conference (SCIPY 2014) 33–39
Kumar D, Roshni T, Singh A, Jha MK, Samui P (2020) Predicting groundwater depth fluctuations using deep learning, extreme learning machine and Gaussian process: a comparative study. Earth Sci Inf 13:1237–1250
Nguyen LH, Holmes S (2019) Ten quick tips for effective dimensionality reduction,PLoS Computational Biology1–19
Okon AN, Adewole SE, Uguma EM (2021) Artifcial neural network model for reservoir petrophysical properties: porosity, permeability and water saturation prediction. Model Earth Syst Environ 7:2373–2390
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Qadir M, Wichelns D, Raschid-Sally L, Minhas PS, Drechsel P, Bahri A, McCornich P (2007) Agricultural use of marginal-quality water opportunities and challenges. IWMI Part 4:225–226
Rozos E (2019) Machine learning, urban water resources management and operating policy. Resources 8(4):173
Seidu J, Ewusi A, Kuma JSY, Ziggah YY, Voigt H-J (2021) A hybrid groundwater level prediction model using signal decomposition and optimised extreme learning machine. Modeling Earth Systems and Environment
Sharma N, Zakaullah M, Tiwari H, Kumar D (2015) Runoff and sediment yield modeling using ANN and support vector machines: a case study from Nepal watershed. Model Earth Syst Environ 1:23
Singha S, Pasupuleti S, Singha SS, Singh R, Kumar S (2021) Prediction of groundwater quality using efficient machine learning technique. Chemosphere 276:130265
Stoica C, Camejo J, Banciu A, Nita-Lazar M, Paun I, Cristofor S, Pacheco OR, Guevara M (2016) Water quality of Danube Delta systems: ecological status and prediction using machine-learning algorithms. Water Sci Technol 73(10):2413–2421
Tabares-Soto R, Orozco-Arias S, Romero-Cano V, Segovia Bucheli V, Rodríguez-Sotelo JL, Jiménez-Varón CF (2020) A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data. PeerJ Comput Sci 6(e270):1–22
Terengganu River Map (2021) wonderfulmalaysia.com. Retrieved by 24th December 2021
Than NH, Ly CD, Tata PV (2021) The performance of classification and forecasting Dong Nai River water quality of sustainable water resources management using neural network techniques. J Hydrol 596:126099
Tiyasha, Tung TM, Yaseen ZM (2021) Deep Learning for Prediction of Water Quality Index Classification: Tropical Catchment Environmental Assessment. Nat Resour Res 30:6
Tousi EG, Duan JG, Gundy PM, Bright KR, Gerba CP (2021) Evaluation of E. coli in sediment for assessing irrigation water quality using machine learning. Sci Total Environ 799:149286
Wahab NA, Kamarudin MKA, Toriman ME, Juahir H, Saad MHM, Ata FM, Ghazali A, Hassan AR, Abdullah H, Maulud KN, Hanafiah MH, Harith H (2019) Sedimention and water quality deterioration problems at Terengganu River Basin, Terengganu, Malaysia. Desalination Water Treat 149:228–241
Woldemariam GW, Tibebe D, Mengesha TE, Gelete TB (2021) Machinelearning algorithms for land use dynamics in Lake Haramaya Watershed, Ethiopia. Modeling Earth Systems and Environment
World Health Organization (WHO) (2021) Water safety and quality. https://www.who.int/teams/environment-climate-change-andhealth/water-sanitation-and-health/water-safety-and-quality
Wu T, Wang S, Su B, Wu H, Wang G (2021) Understanding the water quality change of the Yilong Lake based on comprehensive assessment methods. Ecol Ind 126:107714
Yotava G, Varbanov M, Tcherkezova E, Tsakovski S (2021) Water quality assessment of a river catchment by the composite water quality index and self-organizing maps. Ecol Ind 120:106872
Zhang H, Li H, Gao D, Yu H (2022) Source identification of surface water pollution using multivariate statistics combined with physicochemical and socioeconomic parameters. Sci Total Environ 806:151274
Zhao E, Kuo Y-M, Chen N (2021) Assessment of water quality under various environmental features using a site-specific weighting water quality index. Sci Total Environ 783:146868
Zou J, Huss M, Abid A, Mohammadi P, Torkamani A, Telenti A (2018) A primer on deep learning in genomics. Nat Genet 51(1):12–18
Acknowledgements
We would like to extend the gratitude to the Department of Environment Malaysia for the permission to conduct this study. Enormous appreciation and special thanks to the Department of Environment Malaysia experts for their valuable contribution to this study. The authors also would like to thank Malaysia’s Ministry of Higher Education (MOHE) for supporting this research.
Funding
This study was funded by the Malaysian Ministry of Higher Education (FRGS-RACER: RACER/1/2019/STG06/UNISZA//).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zamri, N., Pairan, M.A., Azman, W.N.A.W. et al. Finding optimal strategies for river quality assessment using machine learning and deep learning models. Model. Earth Syst. Environ. 9, 615–629 (2023). https://doi.org/10.1007/s40808-022-01494-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40808-022-01494-4