Fast Information-Theoretic Agglomerative Co-clustering

Gao, Tiantian; Akoglu, Leman

doi:10.1007/978-3-319-08608-8_13

Tiantian Gao¹⁷ &
Leman Akoglu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8506))

Included in the following conference series:

Australasian Database Conference

1162 Accesses
3 Citations

Abstract

Jointly clustering the rows and the columns of large matrices, a.k.a. co-clustering, finds numerous applications in the real world such as collaborative filtering, market-basket and micro-array data analysis, graph clustering, etc. In this paper, we formulate an information-theoretic objective cost function to solve this problem, and develop a fast agglomerative algorithm to optimize this objective. Our algorithm rapidly finds highly similar clusters to be merged in an iterative fashion using Locality-Sensitive Hashing. Thanks to its bottom-up nature, it also enables the analysis of the cluster hierarchies. Finally, the number of row and column clusters are automatically determined without requiring the user to choose them. Our experiments on both real and synthetic datasets show that the proposed algorithm achieves high-quality clustering solutions and scales linearly with the input matrix size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abdullah, A., Hussain, A.: A new biclustering technique based on crossing minimization. Neurocomputing 69(16-18), 1882–1896 (2006)
Article Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. In: SIGMOD, pp. 94–105 (1998)
Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD 22(2), 207–216 (1993)
Article Google Scholar
Akoglu, L., Tong, H., Meeder, B., Faloutsos, C.: Pics: Parameter-free identification of cohesive subgroups in large attributed graphs. In: SDM (2012)
Google Scholar
Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–86. Springer, Heidelberg (2002)
Chapter Google Scholar
Chakrabarti, D.: AutoPart: Parameter-free graph partitioning and outlier detection. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 112–124. Springer, Heidelberg (2004)
Google Scholar
Chakrabarti, D., Papadimitriou, S., Modha, D.S., Faloutsos, C.: Fully automatic cross-associations. In: ACM SIGKDD, pp. 79–88 (2004)
Google Scholar
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. JASI 41(6), 391–407 (1990)
Article Google Scholar
Dhillon, I., Mallela, S., Modha, D.: Information- theoretic co-clustering. In: ACM SIGKDD (2003)
Google Scholar
Fortunato, S., Barthélemy, M.: PNAS, 104(1), 36 (2007)
Google Scholar
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, pp. 518–529 (1999)
Google Scholar
Gionis, A., Mannila, H., Seppänen, J.K.: Geometric and combinatorial tiles in 0-1 data. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 173–184. Springer, Heidelberg (2004)
Google Scholar
Hamerly, G., Elkan, C.: Learning the k in k-means. In: NIPS (2003)
Google Scholar
Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining top-k frequent closed patterns without minimum support. In: ICDM, pp. 211–218 (2002)
Google Scholar
Karypis, G., Han, E.-H., Kumar, V.: Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer 32(8) (1999)
Google Scholar
Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey. TKDD 3(1), 1:1–1:58 (2009)
Google Scholar
Kröger, P., Kriegel, H.-P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: SDM (2004)
Google Scholar
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Google Scholar
Mishra, N., Ron, D., Swaminathan, R.: On finding large conjunctive clusters. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 448–462. Springer, Heidelberg (2003)
Chapter Google Scholar
Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Physical Review E 69 (2004)
Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: NIPS (2001)
Google Scholar
Pan, J., Manocha, D.: Bi-level locality sensitive hashing for k-nearest neighbor computation. In: ICDE, pp. 378–389 (2012)
Google Scholar
Pelleg, D., Moore, A.: X-means: Extending K-means with efficient estimation of the number of clusters. In: ICML (2000)
Google Scholar
Reiss, D.J., Baliga, N.S., Bonneau, R.: Integrated biclustering of heterogeneous genome-wide datasets. BMC Bioinformatics 7, 280 (2006)
Article Google Scholar
Rissanen, J.: A universal prior for integers and estimation by minimum description length. The Annals of Statistics 11(2), 416–431 (1983)
Article MATH MathSciNet Google Scholar
Satuluri, V., Parthasarathy, S.: Bayesian locality sensitive hashing for fast similarity search. PVLDB 5(5), 430–441 (2012)
Google Scholar
Slonim, N., Tishby, N.: Agglomerative information bottleneck. In: NIPS (1999)
Google Scholar
Sun, J., Faloutsos, C., Papadimitriou, S., Yu, P.S.: Graphscope: parameter-free mining of large time-evolving graphs. In: ACM SIGKDD, pp. 687–696 (2007)
Google Scholar
Wang, Y., Parthasarathy, S., Tatikonda, S.: Locality sensitive outlier detection: A ranking driven approach. In: ICDE, pp. 410–421 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Stony Brook University, USA
Tiantian Gao & Leman Akoglu

Authors

Tiantian Gao
View author publications
You can also search for this author in PubMed Google Scholar
Leman Akoglu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre for Applied Informatics (CAI), College of Engineering and Science, Victoria University, Ballarat Road, 8001, Footscray, VIC, Australia
Hua Wang
Faculty of Engineering, Architecture and Information Technology, School of Information Technology and Electrical Engineering, The University of Queensland, St. Lucia, 4072, Brisbane, QLD, Australia
Mohamed A. Sharaf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, T., Akoglu, L. (2014). Fast Information-Theoretic Agglomerative Co-clustering. In: Wang, H., Sharaf, M.A. (eds) Databases Theory and Applications. ADC 2014. Lecture Notes in Computer Science, vol 8506. Springer, Cham. https://doi.org/10.1007/978-3-319-08608-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-08608-8_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08607-1
Online ISBN: 978-3-319-08608-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics