Skip to main content

A Spectral Clustering Method for Large-Scale Geostatistical Datasets

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10358))

Abstract

Spectral clustering is one of the most popular modern clustering techniques for conventional data. However, the application of the general spectral clustering method in the geostatistical data framework poses a double challenge. Firstly, applied to geostatistical data, the general spectral clustering method produces clusters that are spatially non-contiguous which is undesirable for many geoscience applications. Secondly, it is limited in its applicability to large-scale problems due to its high computational complexity. This paper presents a spectral clustering method dedicated to large-scale geostatistical datasets in which spatial dependence plays an important role. It extends a previous work to large-scale geostatistical datasets by computing the similarity matrix only at a reduced set of locations over the study domain referred to as anchor locations. It has the advantage of using all data during the computation of the similarity matrix at anchor locations; so there is no sacrifice of data. The spectral clustering algorithm can then be efficiently performed on this similarity matrix at anchor locations rather than all data locations. Given the resulting cluster labels of anchor locations, a weighted k-nearest-neighbour classifier is trained using their geographical coordinates as covariates and their cluster labels as the response. The assignment of clustering membership to the entire data locations is obtained by applying the trained classifier. The effectiveness of the proposed method to discover spatially contiguous and meaningful clusters in large-scale geostatistical datasets is illustrated using the US National Geochemical Survey database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cai, D., Chen, X.: Large scale spectral clustering via landmark-based sparse representation. IEEE Trans. Cybern. 45(8), 1669–1680 (2015)

    Article  Google Scholar 

  2. Cao, Y., Chen, D.R.: Consistency of regularized spectral clustering. Appl. Comput. Harmonic Anal. 30(3), 319–336 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  3. Charu, C., Chandan, K.: Data Clustering: Algorithms and Applications. Chapman and Hall/CRC (2013)

    Google Scholar 

  4. Chen, B., Gao, B., Liu, T.-Y., Chen, Y.-F., Ma, W.-Y.: Fast spectral clustering of data using sequential matrix compression. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS, vol. 4212, pp. 590–597. Springer, Heidelberg (2006). doi:10.1007/11871842_56

    Chapter  Google Scholar 

  5. Chen, X., Cai, D.: Large scale spectral clustering with landmark-based representation. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, pp. 313–318. AAAI Press (2011)

    Google Scholar 

  6. Chilès, J.P., Delfiner, P.: Geostatistics: Modeling Spatial Uncertainty. Wiley, NJ (2012)

    Book  MATH  Google Scholar 

  7. Choromanska, A., Jebara, T., Kim, H., Mohan, M., Monteleoni, C.: Fast spectral clustering via the Nyström method. In: Jain, S., Munos, R., Stephan, F., Zeugmann, T. (eds.) ALT 2013. LNCS, vol. 8139, pp. 367–381. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40935-6_26

    Chapter  Google Scholar 

  8. Filippone, M., Camastra, F., Masulli, F., Rovetta, S.: A survey of kernel and spectral methods for clustering. Pattern Recogn. 41(1), 176–190 (2008)

    Article  MATH  Google Scholar 

  9. Fouedjio, F.: A clustering approach for discovering intrinsic clusters in multivariate geostatistical data. In: Perner, P. (ed.) MLDM 2016. LNCS, vol. 9729, pp. 491–500. Springer, Cham (2016)

    Google Scholar 

  10. Fouedjio, F.: Discovering spatially contiguous clusters in multivariate geostatistical data through spectral clustering. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q.Z. (eds.) ADMA 2016. LNCS (LNAI), vol. 10086, pp. 547–557. Springer, Cham (2016). doi:10.1007/978-3-319-49586-6_38

    Chapter  Google Scholar 

  11. Fouedjio, F.: A hierarchical clustering method for multivariate geostatistical data. Spat. Stat. 18, 334–351 (2016)

    Article  MathSciNet  Google Scholar 

  12. Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nyström method. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 214–225 (2004)

    Article  Google Scholar 

  13. Grossman, J.N., Grosz, A., Schweitzer, P.N., Schruben, P.G.: The national geochemical survey - database and documentation. Version 5. U.S. geological Survey, Reston, VA (2008)

    Google Scholar 

  14. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York (2009)

    Book  MATH  Google Scholar 

  15. Hechenbichler, K., Schliep, K.: Weighted k-nearest-neighbor techniques and ordinal classification. Discussion Paper 399, SFB 386, Ludwig-Maximilians University Munich (2004)

    Google Scholar 

  16. Kannan, R., Vempala, S., Vetta, A.: On clusterings: Good, bad and spectral. J. ACM 51(3), 497–515 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  17. Khoa, N.L.D., Chawla, S.: Large scale spectral clustering using resistance distance and spielman-teng solvers. In: Ganascia, J.-G., Lenca, P., Petit, J.-M. (eds.) DS 2012. LNCS, vol. 7569, pp. 7–21. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33492-4_4

    Chapter  Google Scholar 

  18. Kong, T., Tian, Y., Shen, H.: A fast incremental spectral clustering for large data sets. In: 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 1–5. IEEE (2011)

    Google Scholar 

  19. Luxburg, U.V.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  20. Luxburg, U.V., Belkin, M., Bousquet, O.: Consistency of spectral clustering. Ann. Statist. 36(2), 555–586 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  21. Luxburg, U.V., Bousquet, O., Belkin, M.: Limits of spectral clustering. In: Advances in Neural Information Processsing Systems, pp. 857–864 (2004)

    Google Scholar 

  22. Nascimento, M.C., de Carvalho, A.C.: Spectral methods for graph clustering a survey. Eur. J. Oper. Res. 211(2), 221–231 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  23. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processsing Systems, pp. 849–856. MIT Press (2001)

    Google Scholar 

  24. Romary, T., Ors, F., Rivoirard, J., Deraisme, J.: Unsupervised classification of multivariate geostatistical data: two algorithms. Comput. Geosci. 85(Pt. B), 96–103 (2015)

    Google Scholar 

  25. Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007)

    Article  MATH  Google Scholar 

  26. Semertzidis, T., Rafailidis, D., Strintzis, M., Daras, P.: Large-scale spectral clustering based on pairwise constraints. Inform. Process. Manage. 51(5), 616–624 (2015)

    Article  Google Scholar 

  27. Shinnou, H., Sasaki, M.: Spectral clustering for a large data set by reducing the similarity matrix size. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008) (2008)

    Google Scholar 

  28. Tremblay, N., Puy, G., Gribonval, R., Vandergheynst, P.: Compressive spectral clustering. In: Proceedings of the 33rd International Conference on Machine Learning (ICML 2016) (2016)

    Google Scholar 

  29. Vladymyrov, M., Carreira-Perpiñán, M.: The variational Nyström method for large-scale spectral problems. In: Proceedings of the 33rd International Conference on Machine Learning (ICML 2016) (2016)

    Google Scholar 

  30. Wackernagel, H.: Multivariate Geostatistics: An Introduction with Applications. Springer, Heidelberg (2003)

    Book  MATH  Google Scholar 

  31. Wand, M., Jones, C.: Kernel Smoothing. Monographs on Statistics and Applied Probability. Chapman and Hall, Sanford (1995)

    Book  MATH  Google Scholar 

  32. Wang, C.: Large-scale spectral clustering on graphs. In: IJCAI. Elsevier (2013)

    Google Scholar 

  33. Wang, L., Leckie, C., Ramamohanarao, K., Bezdek, J.: Approximate Spectral Clustering, pp. 134–146. Springer, Heidelberg (2009)

    Google Scholar 

  34. Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916. ACM (2009)

    Google Scholar 

  35. Zha, H., He, X., Ding, C., Gu, M., Simon, H.D.: Spectral relaxation for k-means clustering. In: Advances in Neural Information Processsing Systems, pp. 1057–1064 (2001)

    Google Scholar 

  36. Zhang, X., Zong, L., You, Q., Yong, X.: Sampling for Nyström extension-based spectral clustering: incremental perspective and novel analysis. ACM Trans. Knowl. Discov. Data 11(1), 7:1–7:25 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francky Fouedjio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Fouedjio, F. (2017). A Spectral Clustering Method for Large-Scale Geostatistical Datasets. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62416-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62415-0

  • Online ISBN: 978-3-319-62416-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics