Skip to main content

Finding the Optimal Cardinality Value for Information Bottleneck Method

  • Conference paper
Advanced Data Mining and Applications (ADMA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4093))

Included in the following conference series:

  • 2815 Accesses

Abstract

Information Bottleneck method can be used as a dimensionality reduction approach by grouping “similar” features together [1]. In application, a natural question is how many “features groups” will be appropriate. The dependency on prior knowledge restricts the applications of many Information Bottleneck algorithms. In this paper we alleviate this dependency by formulating the parameter determination as a model selection problem, and solve it using the minimum message length principle. An efficient encoding scheme is designed to describe the information bottleneck solutions and the original data, then the minimum message length principle is incorporated to automatically determine the optimal cardinality value. Empirical results in the documentation clustering scenario indicates that the proposed method works well for the determination of the optimal parameter value for information bottleneck method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tishby, N., Pereira, F., Bialek, W.: The information bottleneck method. In: Proc. 37th Allerton Conference on Communication and Computation (1999)

    Google Scholar 

  2. Gordon, S., Hayit Greenspan, J.G.: Applying the information bottleneck principle to unsupervised clustering of discrete and continuous image representations. In: Proceddings of the Ninth IEEE International Conference on Computer Vision (ICCV), vol. 2 (2003)

    Google Scholar 

  3. Goldberger, J., Greenspan, H., Gordon, S.: Unsupervised image clustering using the information bottleneck method. In: Van Gool, L. (ed.) DAGM 2002. LNCS, vol. 2449, Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proc. of the 23rd Ann. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 208–215 (2000)

    Google Scholar 

  5. Verbeek, J.J.: An information theoretic approach to finding word groups for text classification. Masters thesis, The Institute for Logic, Language and Computation, University of Amsterdam (2000)

    Google Scholar 

  6. Niu, Z.Y., Ji, D.H., Tan, C.L.: Document clustering based on cluster validation. In: CIKM 2004: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 501–506. ACM Press, New York (2004)

    Google Scholar 

  7. Schneidman, E., Slonim, N., de Ruyter van Steveninck, R.R., Tishby, N., Bialek, W.: Analyzing neural codes using the information bottleneck method (unpublished manuscript, 2001)

    Google Scholar 

  8. Slonim, N., Tishby, N.: The power of word clusters for text classification. School of Computer Science and Engineering and The Interdisciplinary Center for Neural Computation The Hebrew University, Jerusalem, 91904, Israel (2001)

    Google Scholar 

  9. Tishby, N., Slonim, N.: Data clustering by markovian relaxation and the information bottleneck method. Advances in Neural Information Processing Systems (NIPS) 13 (2000)

    Google Scholar 

  10. Slonim, N., Friedman, N., Tishby, N.: Unsupervised document classification using sequential information maximization. In: Proc. of the 25th Ann. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (2002)

    Google Scholar 

  11. Cover, T.M., Thomas, J.A.: Elements of Information Theory. City College of New York (1991)

    Google Scholar 

  12. Slonim, N.: The Information Bottleneck: Theory and Applications. PhD thesis, the Senate of the Hebrew University (2002)

    Google Scholar 

  13. Slonim, N., Tishby, N.: Agglomerative information bottleneck. Advances in Neural Information Processing Systems (NIPS) 12, 617–623 (1999)

    Google Scholar 

  14. Wallace, C., Freeman, P.R.: Estimation and inference by compact coding. Journal of the Royal Statistical Society 49, 223–265 (1987)

    MathSciNet  Google Scholar 

  15. Wallace, C., Boulton, D.: An information measure for classification. Computer Journal 11, 185–194 (1968)

    Article  MATH  Google Scholar 

  16. Rissanen, J.: Universal Prior for Integers and Estimation by Minimum Description Length. Annals of Statistics 11, 416–431 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  17. Lang, K.: Learning to filter netnews. In: Proc. of the 12th International Conf. on Machine Learning, pp. 331–339 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, G., Liu, D., Tu, Y., Ye, Y. (2006). Finding the Optimal Cardinality Value for Information Bottleneck Method. In: Li, X., Zaïane, O.R., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2006. Lecture Notes in Computer Science(), vol 4093. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11811305_66

Download citation

  • DOI: https://doi.org/10.1007/11811305_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37025-3

  • Online ISBN: 978-3-540-37026-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics