Skip to main content

DXML: Distributed Extreme Multilabel Classification

  • Conference paper
  • First Online:
Big Data Analytics (BDA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13147))

Included in the following conference series:

Abstract

As a big data application, extreme multilabel classification has emerged as an important research topic with applications in ranking and recommendation of products and items. A scalable hybrid distributed and shared memory implementation of extreme classification for large scale ranking and recommendation is proposed. In particular, the implementation is a mix of message passing using MPI across nodes and using multithreading on the nodes using OpenMP. The expression for communication latency and communication volume is derived. Parallelism using work-span model is derived for shared memory architecture. This throws light on the expected scalability of similar extreme classification methods. Experiments show that the implementation is relatively faster to train and test on some large datasets. In some cases, model size is relatively small.

Code: https://github.com/misterpawan/DXML

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Open MPI: Open source high performance computing. https://www.open-mpi.org/

  2. Openmp. https://www.openmp.org/

  3. Kumar, P., Markidis, S., Lapenta, G., Meerbergen, K., Roose, D.: High performance solvers for implicit particle in cell simulation (special issue). Procedia Comput. Sci. 18, 2251–2258 (2013). https://doi.org/10.1016/j.procs.2013.05.396. https://www.sciencedirect.com/science/article/pii/S1877050913005395. 2013 International Conference on Computational Science

  4. Bhatia, K., Jain, H., Kar, P., Varma, M., Jain, P.: Sparse local embeddings for extreme multi-label classification. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 2015, vol. 1, pp. 730–738. MIT Press, Cambridge (2015)

    Google Scholar 

  5. Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. SIGPLAN Not. 30(8), 207–216 (1995). https://doi.org/10.1145/209937.209958

    Article  Google Scholar 

  6. Jain, H., Prabhu, Y., Varma, M.: Extreme multi-label loss functions for recommendation, tagging, ranking and other missing label applications. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 935–944. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2939672.2939756

  7. Jasinska, K., Dembczynski, K., Busa-Fekete, R., Pfannschmidt, K., Klerx, T., Hullermeier, E.: Extreme f-measure maximization using sparse probability estimates. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 1435–1444. JMLR.org (2016)

    Google Scholar 

  8. Jayadev, N., Tanmay, S., Pawan, K.: A riemannian approach for constrained optimization problem in extreme classification problems. CoRR abs/2109.15021 (2021). https://arxiv.org/abs/2109.15021

  9. Jayadev, N., Tanmay, S., Pawan, K.: A riemannian approach for extreme classification problems. In: CODS-COMAD 2021 (2021)

    Google Scholar 

  10. Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Ensembles of multi-objective decision trees. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 624–631. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_61

    Chapter  Google Scholar 

  11. Kumar, P.: Communication optimal least squares solver. In: 2014 IEEE International Conference on High Performance Computing and Communications, 2014 IEEE 6th Intl Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Syst (HPCC, CSS, ICESS), pp. 316–319 (2014). https://doi.org/10.1109/HPCC.2014.55

  12. Kumar, P.: Multithreaded direction preserving preconditioners. In: 2014 IEEE 13th International Symposium on Parallel and Distributed Computing, pp. 148–155 (2014). https://doi.org/10.1109/ISPDC.2014.23

  13. Kumar, P.: Multilevel communication optimal least squares (special issue). Procedia Comput. Sci. 51, 1838–1847 (2015). https://doi.org/10.1016/j.procs.2015.05.410. https://www.sciencedirect.com/science/article/pii/S1877050915012181. International Conference On Computational Science, ICCS 2015

  14. Kumar, P., Meerbergen, K., Roose, D.: Multi-threaded nested filtering factorization preconditioner. In: Manninen, P., Öster, P. (eds.) PARA 2012. LNCS, vol. 7782, pp. 220–234. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36803-5_16

    Chapter  Google Scholar 

  15. Prabhu, Y., Varma, M.: Fastxml: a fast, accurate and stable tree-classifier for extreme multi-label learning, KDD 2014, pp. 263–272. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2623330.2623651

  16. Siblini, W., Meyer, F., Kuntz, P.: Craftml, an efficient clustering-based random forest for extreme multi-label learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4671–4680. PMLR (2018). http://proceedings.mlr.press/v80/siblini18a.html

  17. Tagami, Y.: Annexml: approximate nearest neighbor search for extreme multi-label classification. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 455–464. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3097983.3097987

  18. Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehous. Min. 3, 1–13 (2007)

    Article  Google Scholar 

  19. Weinberger, K.Q., Dasgupta, A., Attenberg, J., Langford, J., Smola, A.J.: Feature hashing for large scale multitask learning. CoRR abs/0902.2206 (2009). http://arxiv.org/abs/0902.2206

  20. Weston, J., Bengio, S., Usunier, N.: Wsabie: scaling up to large vocabulary image annotation, IJCAI 2011, pp. 2764–2770. AAAI Press (2011)

    Google Scholar 

  21. Weston, J., Makadia, A., Yee, H.: Label partitioning for sublinear ranking. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, ICML 2013, vol. 28, pp. II-181–II-189. JMLR.org (2013)

    Google Scholar 

  22. Yen, I.E.H., Huang, X., Zhong, K., Ravikumar, P., Dhillon, I.S.: PD-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 3069–3077. JMLR.org (2016)

    Google Scholar 

  23. Yen, I.E., Huang, X., Dai, W., Ravikumar, P., Dhillon, I., Xing, E.: PPDSparse: a parallel primal-dual sparse method for extreme classification. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 545–553. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3097983.3098083

  24. Yu, H.F., Jain, P., Kar, P., Dhillon, I.S.: Large-scale multi-label learning with missing labels. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, ICML 2014, vol. 32, pp. I-593–I-601. JMLR.org (2014)

    Google Scholar 

  25. Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014). https://doi.org/10.1109/TKDE.2013.39

    Article  Google Scholar 

  26. Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007). https://doi.org/10.1016/j.patcog.2006.12.019

    Article  MATH  Google Scholar 

Download references

Acknowledgement

This work was done at IIIT, Hyderabad using IIIT seed grant. The author acknowledges all the support by institute. This project was partially supported by RIPPLE center of excellence at IIIT, Hyderabad.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pawan Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumar, P. (2021). DXML: Distributed Extreme Multilabel Classification. In: Srirama, S.N., Lin, J.CW., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds) Big Data Analytics. BDA 2021. Lecture Notes in Computer Science(), vol 13147. Springer, Cham. https://doi.org/10.1007/978-3-030-93620-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93620-4_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93619-8

  • Online ISBN: 978-3-030-93620-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics