Skip to main content

Knowledge Inference from a Small Water Quality Dataset with Multivariate Statistics and Data-Mining

  • Conference paper
  • First Online:
Advances in Information and Communication Technologies for Adapting Agriculture to Climate Change (AACC'17 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 687))

  • 795 Accesses

Abstract

Multivariate analysis (MV) and data mining (DM) techniques were applied to a small water quality dataset obtained from the surface waters at three water quality monitoring stations in the Petaquilla River Basin, Panama, during the hydrological period of 2008 through 2011 for the assessment and understanding of the ongoing environmental stress within the river basin. From Factor Analysis (PCA/FA), results indicated that the factors which changed the quality of the water for the two seasons differed. During the dry (low flows) season, water quality showed to be strongly influenced by turbidity (NTU) and total suspended solids (TSS) concentrations. In contrast, during the wet (high flows) season the main changes on water quality sources were characterized by an inverse relation of NTU and TSS with the electrical conductivity (EC) and chlorides (CL), followed by significant sources of agricultural pollution. To complement the MV analysis, DM techniques like cluster analysis (CA) and classification (CLA) was applied to the data. Cluster analysis was used to separate the stations based on their levels of pollution and the classification of stations was implemented by C5.0 algorithm to classify stations of unknown origin into one of the several known groups of water quality constituents. The study demonstrated that the major water pollution threats to the Petaquilla River Basin are industrial and urban development in character and uses of agricultural and grazing land which are defined as non-point sources. The use of DM techniques was to complement the MV analysis. Taking into account the limited data, the usage of these methodologies is regarded useful in aiding water managers for implementing water monitoring campaigns and in setting priorities for improving and protecting water quality sources that are impaired due to land disturbances from anthropogenic activities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Panama, M.: Environmental Impact Assessment Study (2010)

    Google Scholar 

  2. Carpenter, S.R., Caraco, N.F., Correll, D.L., Howarth, R.W., Sharpley, A.N., Smith, V.H.: Nonpoint pollution of surface waters with phosphorus and nitrogen. Ecol. Appl. 8(3), 559–568 (1998)

    Article  Google Scholar 

  3. Wetzel, R.G.: Gradient-dominated ecosystems: sources and regulatory functions of dissolved organic matter in freshwater ecosystems. Hydrobiologia 229(1), 181–198 (1992)

    Article  Google Scholar 

  4. Dinar, A., ed.: Restoring and Protecting the World’s Lakes and Reservoirs, vol. 289. World Bank Publications (1995)

    Google Scholar 

  5. Lewis, W.M.: Basis for the protection and management of tropical lakes. Lakes Reserv. Res. Manage. 5(1), 35–48 (2000)

    Article  MathSciNet  Google Scholar 

  6. Bagenal, T.B.: Fecundity in eggs and early life history (Bagenal, T.B., Braum, E Part 1). In: Bagenal, T.B. (ed.) Methods for Assessment of Fish Production in Freshwaters, 3rd edn. pp. 166–178 (1978)

    Google Scholar 

  7. Simeonov, V., Einax, J.W., Stanimirova, I., Kraft, J.: Environmetric modeling and interpretation of river water monitoring data. Anal. Bional. Chem. 374(5), 898–905 (2002)

    Article  Google Scholar 

  8. Praus, P.: Water quality assessment using SVD-based principal component analysis of hydrological data. Water SA 31(4), 417–422 (2005)

    Google Scholar 

  9. Jayakumar, R., Siraz, L.: Factor analysis in hydrogeochemistry of coastal aquifers–a preliminary study. Environ. Geol. 31(3-4), 174–177 (1997)

    Article  Google Scholar 

  10. Spanos, T., Simeonov, V., Stratis, J., Xristina, X.: Assessment of water quality for human consumption. Microchim. Acta 141(1), 35–40 (2003)

    Article  Google Scholar 

  11. Lu, J., Huang, T.: Data mining on forecast raw water quality from online monitoring station based on decision-making tree. In: Fifth International Joint Conference on INC, IMS and IDC, NCM 2009, pp. 706–709. IEEE (2009)

    Google Scholar 

  12. Fu-Cheng, L., Xue-Zhao, H.: Application of fuzzy c-means clustering for assessing rural surface water quality in Lianyungang City. In: 2013 Fifth International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp. 291–295. IEEE (2013)

    Google Scholar 

  13. Mjolsness, E., DeCoste, D.: Machine learning for science: state of the art and future prospects. Science 293(5537), 2051–2055 (2001)

    Article  Google Scholar 

  14. Jiang, Y., Li, M., Zhou, Z.-H.: Mining extremely small data sets with application to software reuse. Softw. Pract. Exper. 39(4), 423–440 (2009). https://doi.org/10.1002/spe.905

    Article  Google Scholar 

  15. Andonie, R.: Extreme data mining: inference from small datasets. Int. J. Comput. Commun. Control 5(3), 280–291 (2010)

    Article  Google Scholar 

  16. Natek, S., Zwilling, M.: Student data mining solution–knowledge management system related to higher education institutions. Expert Syst. Appl. 41(14), 6400–6407 (2014)

    Article  Google Scholar 

  17. R Core Team: A language and environment for statistical computing. R Foundation for Statistical Computing Department of Agronomy, Faculty of Agriculture of the University of the Free State. Vienna, Austria (2017). https://www.R-project.org/

  18. Hendrickson, A.E., White, P.O.: Promax: a quick method for rotation to oblique simple structure. Br. J. Stat. Psychol. 17, 65–70 (1964)

    Article  Google Scholar 

  19. Ho, R.: Handbook of Univariate and Multivariate Data Analysis and Interpretation with SPSS. CRC Press (2006)

    Google Scholar 

  20. Abel, P.D.: Water pollution biology. CRC Press (1996)

    Google Scholar 

  21. Ayoade, A.A., Fagade, S.O., Adebisi, A.A.: Dynamics of limnological features of two man-made lakes in relation to fish production. Afr. J. Biotechnol. 5(10), 1013–1021 (2006)

    Google Scholar 

  22. Fataei, E., Shiralipoor, S.: Evaluation of surface water quality using cluster analysis: a case study. World J. Fish Mar. Sci. 3, 366–370 (2011)

    Google Scholar 

  23. Areerachakul, S., Sanguansintukal, S.: Classification and regression trees and MLP neural network to classify water quality of canals in Bangkok, Thailand. Int. J. Intell. Comput. Res. (IJICR) 1(1/2), 43–50 (2010)

    Google Scholar 

  24. Quinlan, J.R.: Induction of Decision Trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  25. Salzberg, S.L.: C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., Mach. Learn. 16(3), 235–240 (1993)

    Google Scholar 

Download references

Acknowledgement

The authors of this experiment will like to express their appreciation to Minera Panama S.A., Environmental Department for providing the necessary data. This work has been partially supported by the Spanish MICINN under projects: TRA2015–63708-R, and TRA2016-78886-C3-1-R.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose Simmonds .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Simmonds, J., Gómez, J.A., Ledezma, A. (2018). Knowledge Inference from a Small Water Quality Dataset with Multivariate Statistics and Data-Mining. In: Angelov, P., Iglesias, J., Corrales, J. (eds) Advances in Information and Communication Technologies for Adapting Agriculture to Climate Change. AACC'17 2017. Advances in Intelligent Systems and Computing, vol 687. Springer, Cham. https://doi.org/10.1007/978-3-319-70187-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70187-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70186-8

  • Online ISBN: 978-3-319-70187-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics