Skip to main content

Fast Information-Theoretic Agglomerative Co-clustering

  • Conference paper
Databases Theory and Applications (ADC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8506))

Included in the following conference series:

Abstract

Jointly clustering the rows and the columns of large matrices, a.k.a. co-clustering, finds numerous applications in the real world such as collaborative filtering, market-basket and micro-array data analysis, graph clustering, etc. In this paper, we formulate an information-theoretic objective cost function to solve this problem, and develop a fast agglomerative algorithm to optimize this objective. Our algorithm rapidly finds highly similar clusters to be merged in an iterative fashion using Locality-Sensitive Hashing. Thanks to its bottom-up nature, it also enables the analysis of the cluster hierarchies. Finally, the number of row and column clusters are automatically determined without requiring the user to choose them. Our experiments on both real and synthetic datasets show that the proposed algorithm achieves high-quality clustering solutions and scales linearly with the input matrix size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abdullah, A., Hussain, A.: A new biclustering technique based on crossing minimization. Neurocomputing 69(16-18), 1882–1896 (2006)

    Article  Google Scholar 

  2. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. In: SIGMOD, pp. 94–105 (1998)

    Google Scholar 

  3. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD 22(2), 207–216 (1993)

    Article  Google Scholar 

  4. Akoglu, L., Tong, H., Meeder, B., Faloutsos, C.: Pics: Parameter-free identification of cohesive subgroups in large attributed graphs. In: SDM (2012)

    Google Scholar 

  5. Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–86. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Chakrabarti, D.: AutoPart: Parameter-free graph partitioning and outlier detection. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 112–124. Springer, Heidelberg (2004)

    Google Scholar 

  7. Chakrabarti, D., Papadimitriou, S., Modha, D.S., Faloutsos, C.: Fully automatic cross-associations. In: ACM SIGKDD, pp. 79–88 (2004)

    Google Scholar 

  8. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. JASI 41(6), 391–407 (1990)

    Article  Google Scholar 

  9. Dhillon, I., Mallela, S., Modha, D.: Information- theoretic co-clustering. In: ACM SIGKDD (2003)

    Google Scholar 

  10. Fortunato, S., Barthélemy, M.: PNAS, 104(1), 36 (2007)

    Google Scholar 

  11. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, pp. 518–529 (1999)

    Google Scholar 

  12. Gionis, A., Mannila, H., Seppänen, J.K.: Geometric and combinatorial tiles in 0-1 data. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 173–184. Springer, Heidelberg (2004)

    Google Scholar 

  13. Hamerly, G., Elkan, C.: Learning the k in k-means. In: NIPS (2003)

    Google Scholar 

  14. Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining top-k frequent closed patterns without minimum support. In: ICDM, pp. 211–218 (2002)

    Google Scholar 

  15. Karypis, G., Han, E.-H., Kumar, V.: Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer 32(8) (1999)

    Google Scholar 

  16. Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey. TKDD 3(1), 1:1–1:58 (2009)

    Google Scholar 

  17. Kröger, P., Kriegel, H.-P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: SDM (2004)

    Google Scholar 

  18. Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)

    Google Scholar 

  19. Mishra, N., Ron, D., Swaminathan, R.: On finding large conjunctive clusters. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 448–462. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  20. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Physical Review E 69 (2004)

    Google Scholar 

  21. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: NIPS (2001)

    Google Scholar 

  22. Pan, J., Manocha, D.: Bi-level locality sensitive hashing for k-nearest neighbor computation. In: ICDE, pp. 378–389 (2012)

    Google Scholar 

  23. Pelleg, D., Moore, A.: X-means: Extending K-means with efficient estimation of the number of clusters. In: ICML (2000)

    Google Scholar 

  24. Reiss, D.J., Baliga, N.S., Bonneau, R.: Integrated biclustering of heterogeneous genome-wide datasets. BMC Bioinformatics 7, 280 (2006)

    Article  Google Scholar 

  25. Rissanen, J.: A universal prior for integers and estimation by minimum description length. The Annals of Statistics 11(2), 416–431 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  26. Satuluri, V., Parthasarathy, S.: Bayesian locality sensitive hashing for fast similarity search. PVLDB 5(5), 430–441 (2012)

    Google Scholar 

  27. Slonim, N., Tishby, N.: Agglomerative information bottleneck. In: NIPS (1999)

    Google Scholar 

  28. Sun, J., Faloutsos, C., Papadimitriou, S., Yu, P.S.: Graphscope: parameter-free mining of large time-evolving graphs. In: ACM SIGKDD, pp. 687–696 (2007)

    Google Scholar 

  29. Wang, Y., Parthasarathy, S., Tatikonda, S.: Locality sensitive outlier detection: A ranking driven approach. In: ICDE, pp. 410–421 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Gao, T., Akoglu, L. (2014). Fast Information-Theoretic Agglomerative Co-clustering. In: Wang, H., Sharaf, M.A. (eds) Databases Theory and Applications. ADC 2014. Lecture Notes in Computer Science, vol 8506. Springer, Cham. https://doi.org/10.1007/978-3-319-08608-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08608-8_13

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08607-1

  • Online ISBN: 978-3-319-08608-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics