Skip to main content

A Computational Comparison of Parallel and Distributed K-median Clustering Algorithms on Large-Scale Image Data

  • Conference paper
  • First Online:
Mathematical Optimization Theory and Operations Research (MOTOR 2019)

Abstract

Most commonly used clustering algorithms are those aimed at solving the well-known k-median problem. Their main advantage is that they are simple to implement and use, and they are flexible in choosing dissimilarity measures (not necessarily metrics). K-median algorithms are also known to be more robust to noise and outliers in comparison with k-means algorithms. In spite of that, they have been of limited use for large-scale clustering problems due to their high computational and space complexity. This work aims at computational comparison of k-median clustering algorithms in a specific large-scale setting—clustering large image collections. We implement distributed versions of the most common k-median clustering algorithms and compare them with our parallel heuristic for solving large-scale k-median problem instances. We analyze clustering results with respect to external evaluation measures and run time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Irkutsk supercomputer center of SB RAS. http://hpc.icc.ru. Accessed 15 Feb 2019

  2. An, H.-C., Svensson, O.: Recent developments in approximation algorithms for facility location and clustering problems. In: Fukunaga, T., Kawarabayashi, K. (eds.) Combinatorial Optimization and Graph Algorithms, pp. 1–19. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-6147-9_1

    Chapter  MATH  Google Scholar 

  3. Arbelaez, A., Quesada, L.: Parallelising the k-medoids clustering problem using space-partitioning. In: Helmert, M., Röger, G. (eds.) Proceedings the 6th Annual Symposium on Combinatorial Search, SoCS 2013, pp. 20–28. AAAI (2013)

    Google Scholar 

  4. Avella, P., Boccia, M., Salerno, S., Vasilyev, I.: An aggregation heuristic for large scale p-median problem. Comput. Oper. Res. 39(7), 1625–1632 (2012)

    Article  MathSciNet  Google Scholar 

  5. Avella, P., Boccia, M., Sforza, A., Vasilyev, I.: An effective heuristic for large-scale capacitated facility location problems. J. Heuristics 15(6), 597–615 (2008)

    Article  Google Scholar 

  6. Avella, P., Sassano, A., Vasilyev, I.: Computational study of large-scale p-median problems. Math. Program. 109(1), 89–114 (2007)

    Article  MathSciNet  Google Scholar 

  7. Byrka, J., Pensyl, T., Rybicki, B., Srinivasan, A., Trinh, K.: An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms 13(2), 23:1–23:31 (2017). https://doi.org/10.1145/2981561

    Article  MathSciNet  MATH  Google Scholar 

  8. Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. In: Proceedings 13th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2018, pp. 67–74. IEEE (2018). https://doi.org/10.1109/FG.2018.00020

  9. Carrizosa, E., Ushakov, A., Vasilyev, I.: A computational study of a nonlinear minsum facility location problem. Comput. Oper. Res. 39(11), 2625–2633 (2012)

    Article  MathSciNet  Google Scholar 

  10. Crainic, T.G., Gendreau, M., Hansen, P., Mladenović, N.: Cooperative parallel variable neighborhood search for the p-median. J. Heuristics 10(3), 293–314 (2004)

    Article  Google Scholar 

  11. Daskin, M.S., Maass, K.L.: The p-median problem. In: Laporte, G., Nickel, S., da Gama, F.S. (eds.) Location Science, pp. 21–45. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13111-5_2

    Chapter  Google Scholar 

  12. Fisher, M.L.: The lagrangian relaxation method for solving integer programming problems. Manage. Sci. 27(1), 1–18 (1981)

    Article  MathSciNet  Google Scholar 

  13. Frahm, J.-M., et al.: Building Rome on a cloudless day. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 368–381. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_27

    Chapter  Google Scholar 

  14. García, S., Labbé, M., Marín, A.: Solving large p-median problems with a radius formulation. INFORMS J. Comput. 23(4), 546–556 (2011)

    Article  MathSciNet  Google Scholar 

  15. Garcia-López, F., Melián-Batista, B., Moreno-Pérez, J.A., Moreno-Vega, J.M.: The parallel variable neighborhood search for the p-median problem. J. Heuristics 8(3), 375–388 (2002)

    Article  Google Scholar 

  16. Garcia-López, F., Melián-Batista, B., Moreno-Pérez, J.A., Moreno-Vega, J.M.: Parallelization of the scatter search for the p-median problem. Parallel Comput. 29(5), 575–589 (2003). Parallel computing in logistics

    Article  Google Scholar 

  17. Hanafi, S., Sterle, C., Ushakov, A., Vasilyev, I.: A parallel subgradient algorithm for lagrangean dual function of the \(p\)-median problem. Studia Informatica Universalis 9(3), 105–124 (2011)

    Google Scholar 

  18. Hansen, P., Brimberg, J., Urosević, D., Mladenović, N.: Solving large p-median clustering problems by primal-dual variable neighborhood search. Data Min. Knowl. Discov. 19(3), 351–375 (2009)

    Article  MathSciNet  Google Scholar 

  19. Kariv, O., Hakimi, S.: An algorithmic approach to network location problems. II: The p-medians. SIAM J. Appl. Math. 37(3), 539–560 (1979)

    Article  MathSciNet  Google Scholar 

  20. Kaufman, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical Data Analysis Based on the \(L_1\)-Norm and Related Methods, pp. 405–416. North-Holland (1987)

    Google Scholar 

  21. Li, S., Svensson, O.: Approximating k-median via pseudo-approximation. SIAM J. Comput. 45(2), 530–547 (2016). https://doi.org/10.1137/130938645

    Article  MathSciNet  MATH  Google Scholar 

  22. Mancini, E.P., Marcarelli, S., Vasilyev, I., Villano, U.: A grid-aware MIP solver: implementation and case studies. Futur. Gener. Comp. Syst. 24(2), 133–141 (2008)

    Article  Google Scholar 

  23. Megiddo, N., Supowit, K.J.: On the complexity of some common geometric location problems. SIAM J. Comput. 13(1), 182–196 (1984)

    Article  MathSciNet  Google Scholar 

  24. Mladenović, N., Brimberg, J., Hansen, P., Moreno-Pérez, J.: The p-median problem: a survey of metaheuristic approaches. Eur. J. Oper. Res. 179(3), 927–939 (2007)

    Article  MathSciNet  Google Scholar 

  25. Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Xie, X., Jones, M.W., Tam, G.K.L. (eds.) Proceedings the British Machine Vision Conference (BMVC), pp. 41.1–41.12. BMVA Press (2015). https://doi.org/10.5244/C.29.41

  26. Song, H., Lee, J.G., Han, W.S.: PAMAE: parallel k-medoids clustering with high accuracy and efficiency. In: Proceedings 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 1087–1096. ACM, New York (2017). https://doi.org/10.1145/3097983.3098098

  27. Vasilyev, I., Ushakov, A.: A shared memory parallel heuristic algorithm for the large-scale p-median problem. In: Sforza, A., Sterle, C. (eds.) Optimization and Decision Science: Methodologies and Applications, ODS 2017. Mathematics & Statistics, vol. 217, pp. 295–302. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67308-0_30

    Chapter  Google Scholar 

  28. Vasilyev, I., Ushakov, A.V., Maltugueva, N., Sforza, A.: An effective heuristic for large-scale fault-tolerant k-median problem. Soft Comput. (2018). https://doi.org/10.1007/s00500-018-3562-6

    Article  Google Scholar 

  29. Whitaker, R.A.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. Can. J. Oper. Res. Inf. Process. 21, 95–108 (1983)

    MATH  Google Scholar 

  30. Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016). https://doi.org/10.1109/LSP.2016.2603342

    Article  Google Scholar 

Download references

Acknowledgement

This work is supported by the Russian Science Foundation under grant 17-71-10176.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anton V. Ushakov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ushakov, A.V., Vasilyev, I. (2019). A Computational Comparison of Parallel and Distributed K-median Clustering Algorithms on Large-Scale Image Data. In: Bykadorov, I., Strusevich, V., Tchemisova, T. (eds) Mathematical Optimization Theory and Operations Research. MOTOR 2019. Communications in Computer and Information Science, vol 1090. Springer, Cham. https://doi.org/10.1007/978-3-030-33394-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33394-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33393-5

  • Online ISBN: 978-3-030-33394-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics